Scholars who apply digital humanities tools and methods to non-English languages face a unique set of challenges, particularly when the language in question is not a national language in the country where they work. Most technical tutorials for DH leave the expected language unstated, with an implicit assumption that readers will be working with English. Non-English, non-national language literature departments at institutions worldwide tend to be small, making it more likely that anyone using DH methods in these departments will have to do so without a local support network of colleagues who are engaged in working through the issues that inevitably arise.
The Russian Natural Language Processing working group (Russian NLP) models a new approach to addressing the resource and support needs of a distributed network of scholars who apply DH methods to a non-English, non-national language. Formed in 2019 with support from Stanford University’s Division of Literatures, Cultures, and Languages, the Russian NLP working group has brought together graduate students, faculty, librarians, and staff who work with Russian literary and/or historical texts. The group meets monthly, despite the scheduling challenges inherent in the wide dispersal of its membership, ranging from the west coast of the United States to Moscow. During the course of the 2019-2020 academic year, the group will identify natural-language processing tools and libraries available for working with Russian, and will find or build corpora that are representative of the materials that underpin the participants’ own research areas (e.g. 20th century diaries and letters, 19th century novels and plays, 21st century internet text). The group will evaluate the existing tools using these corpora, and will write up reviews of the tools’ performance when applied to different kinds of texts, in order to facilitate other scholars’ decision-making about which tools to adopt for their own DH work with Russian-language materials. In addition, the group will be writing technical tutorials, and is exploring the possibility of collaborating with the developers of English-language NLP tools to adapt them for use with Russian.
This poster will present the major results of the group’s work for this year (e.g. the evaluation of existing tools for different kinds of Russian corpora), as well as reflections on the group’s organization and approach, and the challenges it has encountered. We anticipate that this will be of interest and value to anyone working with non-English materials who wants to cultivate a virtual community of practice around applying DH methods to their language.