Construction of the corpus of senmyō one of the oldest materials of Japanese language

1. Abstract

We worked on construction of the corpus of senmyō (imperial edict) written in the 8th century for linguistic research. Senmyō is one of the oldest materials of Japanese language and is written in Old Japanese using a special notation method using only Chinese characters called "senmyō-gaki". In order to encode this notation, we reproduced it using a originally extended tag set based on TEI. We also added word information to the full text of this corpus using Mecab and UniDic. As some of the words in senmyō can be read in two ways, Chinese style and Japanese style, we devised that these two readings can be assigned to the same location when adding word information. This corpus is published through an online search application called "Chunagon".

Toshinobu OGISO (, NINJAL, Japan, Neisin GO (, NINJAL, Japan, Yukie IKEDA (, Chuo University, Japan and Tetsuya SUNAGA (, Showa Women's University, Japan

