<?xml version="1.0" encoding="UTF-8"?><TEI xmlns="http://www.tei-c.org/ns/1.0"><teiHeader><fileDesc><titleStmt><title type="full"><title type="main">Automatic Labeled Data Generation for Person Named Entity Disambiguation on the <em>Ming Shilu</em></title><title type="sub"/></title></titleStmt><author><persName><surname>Tsai</surname><forename>Richard Tzong-Han</forename></persName><affiliation>Department of Computer Science and Information Engineering, National Central University, Taiwan</affiliation><affiliation>Research Center for Humanities and Social Sciences, Academia Sinica, Taiwan</affiliation><email>thtsai@csie.ncu.edu.tw</email></author><author><persName><surname>Wu</surname><forename>Cheng-Han</forename></persName><affiliation>Department of Computer Science and Information Engineering, National Central University, Taiwan</affiliation></author><author><persName><surname>Pai</surname><forename>Pi-Ling</forename></persName><affiliation>Research Center for Humanities and Social Sciences, Academia Sinica, Taiwan</affiliation></author><author><persName><surname>Fan</surname><forename>I-Chun</forename></persName><affiliation>Institute of History and Philology, Academia Sinica, Taiwan</affiliation></author><editionStmt><edition><date>43955</date></edition></editionStmt><publicationStmt><publisher>Name, Institution</publisher><address><addrLine>Street</addrLine><addrLine>City</addrLine><addrLine>Country</addrLine><addrLine>Name</addrLine></address></publicationStmt><sourceDesc><p>Converted from an OASIS Open Document</p></sourceDesc></fileDesc><encodingDesc><appInfo><application ident="DHCONVALIDATOR" version="1.22"><label>DHConvalidator</label></application></appInfo></encodingDesc><profileDesc><textClass><keywords scheme="ConfTool" n="category"><term>Paper</term></keywords><keywords scheme="ConfTool" n="subcategory"><term>Poster</term></keywords><keywords scheme="ConfTool" n="keywords"><term>Named Entity Disambiguation</term><term>Automatic Labeled Data Generation</term><term>BERT</term></keywords><keywords scheme="ConfTool" n="topics"><term>Asia</term><term>English</term><term>15th-17th Century</term><term>artificial intelligence and machine learning</term><term>natural language processing</term><term>Computer science</term><term>History</term></keywords></textClass></profileDesc></teiHeader><text><body><p>One important task of historical research in DH is to identify person names from history texts. This task can be divided into two subtasks: person named entity recognition (PNER) and person named entity disambiguation (PNED). PNED is to link each PNE mention to a specific person profile in the reference knowledge base. The main challenge of machine-learning-based PNED is the lack of annotated data. We design an automatic approach to labeling the training data. We choose the Ming Shilu as our target history texts. We use the Ming-Qing Archives Name Authority Database as our reference knowledge base, which contains 14,070 government officials living in Ming dynasty. Our BERT-based model reaches an accuracy of 90.1%, which proves that our approach can generate labeled data for the PNED task of very high quality on Chinese history texts. For the general situation (including trivial instances), the accuracy is even higher (~98%).</p></body></text></TEI>