Computational access to library’s digital collections

1. Abstract

Born-digital or digitized resources enable researchers to apply computational methods to various research topics in the field of digital humanities (Klingenstein, Hitchcock, & DeDeo, 2014; Nanni, Dietz, & Ponzetto, 2017). There have been tutorials and workshops on how to apply them with multiple tools like R or Python targeted for DH researchers (Unsworth, 2009; Mäkelä, 2019; Mullen, 2018; The Programming Historian, n.d.; The Digital Humanities Summer Institute: Technologies East 2020, n.d.). In addition, there is an initiative, Always Already Computational: Collections as Data to develop a strategic direction and guide libraries and cultural heritage institutions to provide collections as data for researchers so that they can leverage computational methods (Padilla, Allen, Frost, Potvin, Russey Roke, & Varner, 2019).

In this lightning talk, I will share my journey to making the NAHO (National Aboriginal Health Organization) WARC (web archive format) file computationally accessible. The University of Ottawa Library initiated a web archiving project to preserve entire NAHO web content which is currently accessible only via Wayback Machine. There is no easy way to extract data from the NAHO collection which prevents researchers to apply computational methods in research tools like R, Python, or Archives Unleashed toolkit. I will also talk about processes, challenges, and resources in order to provide computational access to library collections with an example of the NAHO WARC file.

This lightning talk is targeted for DH or librarians, but open for all and they can learn the importance and processes of how to make digital collections computationally accessible.


Unsworth, J. (2009). Computational methods in humanities research. Retrieved from

Klingenstein, S., Hitchcock, T., & DeDeo, S. (2014). The civilizing process in London’s Old Bailey. Proceedings of the National Academy of Sciences, 111(26), 9419-9424. doi: 10.1073/pnas.1405984111

Maemura, E., Becker, C., & Milligan, I. (2016, December). Understanding computational web archives research methods using research objects. In 2016 IEEE International Conference on Big Data (Big Data) (pp. 3250-3259). IEEE. doi: 10.1109/BigData.2016.7840982

Mäkelä, E. (2019). Introduction to methods for digital humanities. Retrieved from

Mullen, L. A. (2018). Computational historical thinking: With applications in R. Retrieved from

Nanni, F., Dietz, L., & Ponzetto, S. P. (2017). Toward a computational history of universities: Evaluating text mining methods for interdisciplinarity detection from PhD dissertation abstracts. Digital Scholarship in the Humanities, 33(3), 612-620. doi: 10.1093/llc/fqx062

Padilla, T., Allen, L., Frost, H., Potvin, S., Russey Roke, E., & Varner, S. (2019). Final Report --- Always Already Computational: Collections as Data. (project site:

The Digital Humanities Summer Institute: Technologies East 2020. (n.d.) Retrieved from

The Programming Historian. (n.d.). Retrieved from

Weingart, S., & Jorgensen, J. (2012). Computational analysis of the body in European fairy tales. Literary and Linguistic Computing, 28(3), 404-416. doi: 10.1093/llc/fqs015

Yoo Young Lee (, University of Ottawa Library, Canada

Theme: Lux by Bootswatch.