NER on Ancient Greek texts with minimal annotation

1. Abstract

This paper presents the results in the adaptation of a new workflow of Named Entity Recognition and classification applied to primary sources in Ancient Greek. We used a model of language-independent data extraction and pattern discovery based on machine learning algorithms, which allowed the extraction of a dataset of automatically classified place-names and ethnonyms starting from a small manually annotated dataset. The idea is that we should be able to train the machine to recognize an entity from recurring elements in the context, without providing a long annotated training dataset in advance, working on the assumption that premodern textual sources display a recognized systematicity in their linguistic encoding of space, which provides a test-case for automatic and semi-automatic methods of pattern discovery and extraction.

Chiara Palladino (chiara.palladino@furman.edu), Furman University, Classics Department, Farimah Karimi (bmathiak@uni-koeln.de), GESIS Leibniz Institute for the Social Sciences and Brigitte Mathiak , University of Cologne, Institute of Digital Humanities

Theme: Lux by Bootswatch.