Introduction
A persisting problem in near eastern studies is the existence of broken cuneiform tablets (listing 1.1). In recent years efforts have been undertaken to 3DScan Mara et al. (2010), to paleographically describe Homburg (2019) and to digitally recon- struct broken fragments Collins et al. (2014, 2017) of cuneiform tablets. However, not always broken fragments can complement each other and often parts of the cuneiform tablet remain destroyed. These fractures or gaps in the cuneiform tablet are not always easy for scholars to fill and take a considerable amount of interpretation time on their part. With the emergence of more digitally available cuneiform text resources, this publication sees an opportunity to investigate if auto-complete algorithms, based on machine learning and linguistic linked open data (LLOD) resources Homburg (2017) can be useful in the reconstruction of cuneiform texts. The classification results are to be used to create a epoch and language specific recommendation system to fill gaps on cuneiform tablets, therefore assisting cuneiform scholars.
Related Work
Related work has been done in autocompletion systems which face the similar challenge of anticipating the users input derived from context and other features Leung & Zhang (2008), Gikandi (2006), Hyvönen & Mäkelä (2006). Those tech- nologies are heavily relied on in input method engines1 which are powered with different dictionary-based algorithms, but recently Chen et al. (2015), Huang et al. (2018) also with machine learning approaches and neural networks. Input method engines for cuneiform have been developed by Homburg et al. (2015).
Methodology
Following Homburg & Chiarcos (2016) machine learning methods applied are either based on grammatical rules (POSTagging), dictionary-based methods ex- ploiting (third-party) dictionary resources or statistical approaches using the following types of machine learning features: – Context-dependent features: e.g. for Hidden Markov Model Classifications – Grammatical features derived from POSTaggers – Semantic Features derived from the semantic meaning of surrounding words – Metadata Features e.g. text categorizations – Paleographic Features using PaleoCodage for a subset of manually annotated texts Homburg (2019)
Experimental Setup
The effectiveness of the algorithms and features is tested on a corpus of all CDLI texts in ATF which is split in a training and test set. Texts are prepared with random gaps for classification and evaluated using the original texts (the gold standard) on unicode cuneiform and on the respective cuneiform transliteration for different cuneiform languages (Sumerian, Hittite, Akkadian) and epochs. The poster features selected peliminary results of the classification and a significance analysis of the features for further discussion for improvement. A possible future goal could be a shared task to improve classification accuracy similar to the cuneiform language identification challenge Jauhiainen et al. (2019)Application
Lastly, the poster presents a prototypical application (fig. 1) displaying the results of the machine learning process which is currently in devel- opment. The implementation builds up on the concept of input method engines Homburg et al. (2015) and will provide a self-learning component.
Figure 1: Text Completion Prototype: ”If...Enlil”. The dictionary knows, that Enlil is a gods name (NE) and is commonly preceded by a determinative character for god 𒀭(an), which is suggested in first place to fill the gap. Next likely options are a person named Enlil (male or female), the people (tribe) of Enlil, or a location
Bibliography
Chen, S., Zhao, H. & Wang, R. (2015), Neural network language model for