Cultural Analytics and the Book Review Models, Methods, and Corpora

1. Abstract

In 2013, Peter Boot highlighted the merits of a corpus of online book responses. “If reading responses are important for the study of literature and its effects,” he argued, “it follows that we need to understand them better.”[1] A corpus of book responses would also help us see “relationships between the responses and the works that they respond to, in terms of topics and narrative”; for example “what characters and plot developments do reviewers respond to?”[2] Boot called for a sufficiently large, representative corpus that included many book genres, and relevant contextual metadata meeting the standards of open data access and usability. Six years later, scholarship by James F. English, Allison Hegel, Andrew Piper & Richard Jean So, Dan Sinykin, and Jordan Sellers & Ted Underwood (among others) has shown the range of insights that can be drawn from a corpus of book reviews and/or sets of book review corpora.[3] Simultaneously, prominent voices in digital humanities have called on the scholarly community “to consider the nature of ontological gaps and epistemological biases,” including the “infrastructures of knowledge-making” on which large scale, computational studies often depend.[4] This panel will present four papers at the forefront of cultural analytics methods and analysis of book reviews.

In “Modeling Bibliographical Information in Historic Book Reviews: Large Scale Applications with Proquest’s American Periodicals Series,” Matthew J. Lavin evaluates computational methods to classify items as likely book reviews, differentiate single-work reviews from multi-work reviews, and extract from reviews information about the book and author being reviewed.

“Book Reviews and the Consolidation of Genre” (Kent Chang, Yuerong Hu, Wenyi Shang, Aniruddha Sharma, Shubhangi Singhal, Ted Underwood, Jessica Witte, Peizhen Wu) addresses a common doubt about the significance of text analysis by showing that measurements of similarity between literary texts correlate with similarities between their reviews. It then uses this method to trace the consolidation of genre.

In “Reconstructing Consecration in US Literary History, 1965-2000,” Dan Sinykin uses social network analysis methods on book reviews to evaluate novels’ centrality in a network. It analyzes the publishers most or least likely to offer an author literary success, the impact of gender and race on reception, and the changing literary value of genre.

In “The Crowdsourced ‘Classics’ and the Revealing Limits of Goodreads Data,” Melanie Walsh and Maria Antoniak analyze more than 100,000 Goodreads reviews of the most popular “classics” in order to reveal the widespread use of “classic” as a colloquial literary critical term that is related to but distinct from the “canon.” Through the lens of these reviews, the authors investigate how online readers and institutions both old and new collaboratively construct the highly lucrative category of the “classics.”

Collectively, these four papers offer new perspectives on corpus development, modeling the book review as an object of study, and analyzing large sets of book reviews using natural language processing, machine learning, and social network analysis methods.


1. Boot, Peter. “The Desirability of a Corpus of Online Book Responses.” Proceedings of the Workshop on Computational Linguistics for Literature. ACL, 2013, 32.

2. Ibid.

3. See, for example, James F. English, et al. “Mining Goodreads: Literary Reception Studies at Scale,”; Allison Hegel, “Social Reading in the Digital Age.” Ph.D., University of California, Los Angeles, 2018.; Andrew Piper and Richard Jean So, “Women Write About Family, Men Write About War,” The New Republic, April 8, 2016.; Dan N. Sinykin “The Conglomerate Era: Publishing, Authorship, and Literary Form, 1965–2007.” Contemporary Literature 58, no. 4 (2017): 462–91; Ted Underwood, and Jordan Sellers. “The Longue Durée of Literary Prestige.” Modern Language Quarterly 77, no. 3 (September 1, 2016): 321–44.

4. Katherine Bode, “Why You Can’t Model Away Bias,” Preprint, forthcoming from Modern Language Quarterly, 2.; Bonnie Mak, “Archaeology of a Digitization.” Journal of the Association for Information Science and Technology 65.8 (2014): 1519, qtd. in Bode 2.

Matthew J. Lavin (, University of Pittsburgh, United States of America, Kent Chang (, Carnegie-Mellon University, Yuerong Hu (, University of Illinois, Wenyi Shang (, University of Illinois, Aniruddha Sharma (, University of Illinois, Shubhangi Singhal (, University of Illinois, Ted Underwood (, University of Illinois, Jessica Witte , University of Illinois, Peizhen Wu , University of Illinois, Dan Sinykin , Emory University, Melanie Walsh , Cornell University and Maria Antoniak , Cornell University

Theme: Lux by Bootswatch.