"Women’s Writing in the Eighteenth Century: Evaluating ‘Representative’ Corpora"

1. Abstract

Despite the crucial importance of corpus-building to the interpretation of text-mining research, it is often extremely difficult to know what is in a corpus. Even large institutional resources used by many scholars provide little context for their choices of what to include or exclude. These hidden choices are particularly problematic when historical selection factors might have led to the creation of corpora which re-create social inequalities. I examine six corpora which are used as the basis of most eighteenth century distant reading. I manually evaluate each corpus’s holdings for a very narrow selection of texts, works published in England 1789-99, to answer a series of bibliographical questions, including: how many titles are by men, by women, or unsigned? What broad categories of writing are represented — novels, plays, poetry, pamphlets, songs, sermons, ephemera, others? Analyzing the differences, I ask: do the most invested-in resources underrepresent women?