Machine Learning for Literary Criticism Analyzing Forms, Genres, and Figurative Language

1. Abstract

Introduction / Importance

Since literary critics began using topic models on large text corpora, we perceive literary periods as more fluid (Underwood, 2013; Pressman, 2014) and subgenres as more dynamic (Jockers, 2013; Underwood, 2014). These advances are mostly concentrated in prose fiction. Prose is more straightforward and verbose than poetry (Rhody, 2012), even if the problems of poetics are tractable for forms from Victorian sonnets to free verse (Houston, 2015; Bories et al., n.d.). Poetry is rarely straightforward: it uses words that resonate with other words, that complicate ideas and change meanings, that are there for idiomatic, rhythmic, allusive, formal, tonal, thematic, semantic, or idiosyncratic reasons. In sum, there are so many reasons that poets use particular words that machines struggle to model their topics statistically.

We use a recurrent neural network (RNN) for classifying sonnets, which are formally defined (14-line rhyming poems) but which also exhibit generic qualities of arguments, subjects/topics, tones, moods, and forms of address. We have built a computational model capable of scoring any text for its formal and generic resemblance to accepted criteria, for scoring its “sonnetness.” Our goal is to find poems that have the generic features of sonnets, but not the formal criteria like a Petrarchan or Shakespearean rhyme scheme. These results will address our core question: to what degree sonnets, both individually and as a category, are defined formally or generically.


The standard distinction between Petrarchan and Shakespearean sonnets is based on rhyme schemes, but we set out to see if machine learning could define features that we couldn’t see. We began with diction, or word choices that constitute both form and genre; the results were so promising that we extended the dimensionality of our model to incorporate four other dimensions: sound, rhyming, punctuation, and lineation. This identified a set of poems that we would never have considered.

Results / Discussion

In this presentation we will address why we began with early sonnets, which set conventions to which later English sonnets respond. We moved from a hand-transcribed test set to a corpus of 253,000 English-language poems from 12 centuries. Now we are expanding to two larger corpora: the 70,000 English texts printed before 1700, in the Early English Books Online - Text Creation Partnership (EEBO-TCP) corpus; and to the 334,000 volumes of literature in the HATHI Trust Digital Library.

In our past work (Ullyot and Bradley, 2018), we concluded that exceptions to the rules make language poetic. Poetry is deliberately irregular. It does not obey rules, it sets and then resets them. By expanding the canon of sonnets, our current project will unsettle critics’ orthodox ideas about them.


Bories, Anne-Sophie, Petr Plechá?, and Pablo Ruiz Fabo (n.d.), ‘Plotting Poetry / Machiner la poésie’,, Accessed 25 September 2019.
Houston, Natalie (2015), ‘Visualizing the Cultural Field of Victorian Poetry’, in Alfano, Veronica and Andrew M. Stauffer (eds.), Virtual Victorians: Networks, Connections, Technologies (New York: Palgrave Macmillan), 121-41.
Jockers, Matthew L. (2013), Macroanalysis: Digital Methods and Literary History, (Urbana, Chicago, and Springfield, IL: University of Illinois Press)
Pressman, Jessica (2014), Digital Modernism: Making it New in New Media, (New York: Oxford University Press).
Rhody, Lisa (2012), ‘Topic Modeling and Figurative Language’, Journal of Digital Humanities, 2 (1), n.p.
Ullyot, Michael and Adam J. Bradley (2018), ‘Machines and Humans, Schemes and Tropes’, Early Modern Literary Studies, 20 (2), n.p..
Underwood, Ted (2013), Why Literary Periods Mattered: Historical Contrast and the Prestige of English Studies, (Stanford: Stanford University Press).
——— (2014), ‘Understanding Genre in a Collection of a Million Volumes: Interim Report.’, Accessed 25 September 2019.

Michael Ullyot (, University of Calgary, Canada and Adam James Bradley (, Ontario Tech University

Theme: Lux by Bootswatch.