Inferring book relationships at the trillion-word scale

1. Abstract

Large digital libraries like the HathiTrust Digital Library (HTDL) provide texts of historical, cultural, or literary significance at unprecedented scales. However, the size and the consortial approach to building them can confuse computational attempts to model the collection, due to issues such as uneven duplication and incomplete metadata. This paper presents the technical workflow of a project seeking to address those challenges.

Peter Organisciak (, University of Denver, United States of America and Benjamin M. Schmidt (, New York University, United States of America

Theme: Lux by Bootswatch.