Our poster will present and link to a publication and toolkit for working with Victor Hugo’s Les Misérables, in French and English, using the CITE Architecture. The published data will include CTS-Compliant texts in French and English, and programmatically derived versions of those texts: TEI-XML, HTML, stop-words removed (useful for Topic Modelling), lemmatized (stemmed) editions, vocabulary lists, contextualized concordance, and a web-based translation-alignment tool.
The deliverable is not only a very rich deluxe, bilingual edition of the novel, but the documented scripts used to take a CITE/CTS text and transform it for different presentations and analyses.