Foundations of Distant Reading. Historical Roots, Conceptual Development and Theoretical Assumptions around Computational Approaches to Literary Texts

1. Abstract


The term 'distant reading' resonates across DH: It is played on in book titles (Distant Horizons, Underwood 2019) and adapted to new fields ('Distant Viewing', Arnold and Tilton 2019). It spurs alternative formulations ('Scalable Reading', Mueller 2012) and is present in mainstream media ("What is Distant Reading?", Schulz 2011). It is a popular and integrating term, but can take very specific meaning as well.1

However, the semantic content carried over in each case of adoption or adaption is often unclear. Recent debates, like the special issue of PMLA (On Franco Moretti’s Distant Reading 2017) or the paper by Nan Z. Da (Da 2019) and the reactions to it, have challenged some of the assumptions of 'distant reading'. Also, the polysemy of the term may have contributed to misunderstandings in these debates.

Therefore, our aim is to recover the historicity of the term 'distant reading', first introduced by Franco Moretti (2000) in his discussion of world literature as a system, by delineating how its meaning has changed over time and reconstructing some of the key theoretical assumptions it carries both as a term, a concept and a practice.

Historical roots

The pre-history to the concept now covered by the term 'distant reading' reaches back to the 15th century, when a rhetorical topos of "too many books" appeared (see Blair 2011). The solution was in excerpts and encyclopedias, based on the principles of compilation and summarization. The goal was to provide access to the essence of all relevant books instead of having to see them all at the same time. Of course, quantitative approaches to literary texts have appeared before the advent of computing (e.g. Mendenhall 1887) and computational approaches have diversified before the term 'distant reading' appeared (e.g. Ellegård 1962, Mosteller and Wallace 1963, Burrows 1987; see Hockey 2000).

Conceptual Development

When Franco Moretti first coined the term 'distant reading' in 2000, he used it with a meaning reminiscent of the compilatory origins of the concept, similar to "second-hand reading": using research literature, metadata or other short-cuts like titles and subtitles instead of reading the full text. From this starting point, and in parallel with more computational and more quantitative practices, Distant Reading has evolved to designate any computational, but especially quantitative, method of literary text analysis - so much so that the term now 'self-evidently implies computation' (Goldstone 2017, 637; see also Underwood 2017 and Bode 2017).

Theoretical Assumptions

A fundamental assumption of the earlier concept of 'distant reading' was that because metadata or secondary literature are created by humans who have read the full texts, they can stand in for the full text. Also, that the bird's eye's view provides insight into the longue durée and into literature as a system (Oberhelman 2015). A fundamental assumption of current Distant Reading research is that useful (even if imperfect) formal and quantifiable textual features can be used as indicators or proxies for relevant literary phenomena, hence the centrality of modeling (see McCarty 2005; Flanders and Jannidis 2019) in Distant Reading research practice. Finally, the idea that despite the broadening meaning of the term “literature” (decanonization), literary texts have a specific way of functioning that requires the adaptation of methods to this domain.


We hope that by more usefully contextualizing the development of the strategic term 'distant reading', we can help avoid misunderstandings in current debates about computational approaches in humanistic inquiry.


Arnold, Taylor, and Lauren Tilton. 2019. “Distant Viewing: Analysing Large Visual Corpora.” Digital Scholarship in the Humanities.

Blair, Ann M. 2011. Too Much to Know: Managing Scholarly Information before the Modern Age. New Haven: Yale University Press.

Bode, Katherine. 2017. “The Equivalence of ‘Close’ and ‘Distant’ Reading; or, Toward a New Object for Data-Rich Literary History.” Modern Languages Quarterly 78 (1): 77–106.

Burrows, John. 1987. Computation into Criticism: A Study of Jane Austen’s Novels and an Experiment in Method. Oxford: Clarendon Press.

Da, Nan Z. 2019. “The Computational Case Agains Computational Literary Studies.” Critical Inquiry.

Ellegård, Alvar. 1962. A Statistical Method for Determining Authorship: The Junius Letters, 1769-1772. Gothenburg: University of Gothenburg.

Flanders, Julia, and Fotis Jannidis, eds. 2019. “The Shape of Data in the Digital Humanities: Modeling Texts and Text-Based Resources.” Digital Research in the Arts and Humanities. London & New York: Routledge.

Goldstone, Andrew. 2017. “The Doxa of Reading.” PMLA.

Hockey, Susan. 2000. Electronic Texts in the Humanities: Principles and Practice. Oxford: Oxford University Press.

McCarty, Willard. 2005. Humanities Computing. New York: Palgrave Macmillan.

Mendenhall, Thomas C. 1887. “The Characteristic Curves of Composition.” Science ns9 (2145): 237–46.

Moretti, Franco. 2000. “Conjectures on World Literature.” New Left Review, no. 1.

Mosteller, Frederick, and David L. Wallace. 1963. “Inference in an Authorship Problem.” Journal of the American Statistical Association 58 (302): 275–309.

Mueller, Martin. 2012. “Scalable Reading.” 2012.

Oberhelman, David D. 2015. “Distant Reading, Computational Stylistics, and Corpus Linguistics: The Critical Theory of Digital Humanities for Literature Subject Librarians.” Digital Humanities in the Library: Challenges and Opportunities for Subject Specialists. Chicago: Illinois: Association of College, Research Libraries.

“On Franco Moretti’s Distant Reading.” 2017. Publications of the Modern Language Association (PMLA) 132 (3): 613–89.

Schulz, Kathryn. 2011. “The Mechanic Muse - What Is Distant Reading?” The New York Times.

Underwood, Ted. 2017. “A Genealogy of Distant Reading.” Digital Humanities Quarterly 11 (2).

Underwood, Ted. 2019. Distant Horizons: Digital Evidence and Literary Change. Chicago: The University of Chicago Press.


1 This contribution has emerged from work in the COST Action "Distant Reading for European Literary History" (CA16204,, a pan-European, collaborative networking project launched in 2017.

Christof Schöch (, University of Trier, Germany, Maciej Eder , Institute for Polish Language, Poland, Rosario Arias , University of Málaga, Spain, Pieter Francois , University of Oxford, UK and Antonija Primorac , University of Rijeka, Croatia

Theme: Lux by Bootswatch.