Introduction
Digital paleography has been emerging as a field of research since the beginning
of the new century. Paleographers, describe how a text has been displayed, and
collect information such as writing styles, contextual information and the epoch.
Ciula (2017) gives a good summary about digital paleography and its practices.
Challenges concerning digital paleography are discussed by Hassner et al. (2015)
for writing systems and discussed more broadly in Stokes (2015). The main
results of these publications is, that the community of digital paleographers,
which was mainly focused on a few particular languages, sees to broaden its
scope and to research more global models to represent paleographic features
such as writing style, typographic features and anomalies in texts. This calls for
a unified representation of paleographic features to which this publication would
like to contribute by suggesting a core vocabulary for paleographic description
of texts.
Related Work
Paleographic features can to a certain extent be represented in TEI/XML1 Wit-In the linguistic community, linguistic linked open data (LLOD)McCrae et al.
(2016) is very present and allows for tools such as BabelNet Navigli & Ponzetto
(2012), a multilingual semantic network to translate words and texts based on
semantic content. This publication suggests to create such a semantic network
for paleographic descriptions in order to formalize this part of research. Natu-
rally, such a task cannot be done by a single scholar from a single field, so the
publication begins by suggesting vocabulary contents documenting the structure
of systematic scripts like cuneiform and to model relations between components
of parts of scripts, the so-called core vocabulary. Especially structured scripts
(cf. fig. 1) expose a variety of encoding schemes for e.g. Chinese Bishop & Cook
(2003), Japanese Apel (2014). Those encodings have been created to model
fonts, but form an ideal basis to create the proposed unified vocabulary. Similar
to the OliA ontologies Chiarcos & Sukhareva (2015) the author suggests to ex-
tend this core vocabulary representing the structure of the script (figure 3) by
language/script specific extensions. The concept will be presented using the ex-
ample of cuneiform and egyptian hieroglyphics and builds upon the vocabulary
shown in figure 2.
Figure 1. Example of the PaleoCodage encodingHomburg (2019): PaleoCodage repre-
sents a machine-readable way to describe highly structured scripts such as cuneiform.
This structure, relations to other similar characters and scripts can be modelled using
the proposed core vocabulary, while the language specific vocabularies would describe
writing styles, shapes and stilistic characteristics of the respective script.
Figure 2: Vocabulary describing two cuneiform glyphs connected to character rep-
resentations, their finding spot, their assigned epoch, an assigned glyph sense (which
may be distinct from the character/word sense) and possible serializations in SVG and
as a PaleoCode. For other languages other encodings are possible.
Figure 3: Excerpt: PaleoCodage Vocabulary describing a cuneiform sign. The
sign’s structure is described using a PaleoCode which could itself be described using
Semantic relations. These relations allow to model the structure of script subelements
with extensions for paleographic features
Bibliography
Apel, U. (2014), ‘Kanjivg’, kanji SVG dataset, Creative Commons BY-SA 3. 3
This presentation presents first efforts to develop a vocabulary for paleographic features across different languages. It will feature a case study using cuneiform languages to infer relevant elements for a general vocabulary for paleography. The poster can be seen as a point of discussion and invitation for colleagues of similar and other fields to further develop the vocabulary which can be used to describe writing styles. When finished it is planned to be used among others to benefit machine learning tasks in computational lingustics as a paleographic desctiption of signs can act as a source of features which is rarely exploited in such scenarios.