Conceptual Modeling of Similarities and Duplication in Large-Scale Digital Libraries

1. Abstract

Large-scale digital libraries, such as the HathiTrust Digital Library (HTDL) and the Internet Archive have emerged consortially, collecting works from institutions around the world. This has led to unevenly biased duplication: some works recur many times in the collections, while others may only have one copy. The Massive Text Lab at the University of Denver is researching levels of ‘sameness’ and duplication of works within these digital libraries through massive-scale analysis. We will discuss applications to modern cataloging standards and provide an overview of the issue and intricacies of duplication, the solutions the project is pursuing, and the value that our work provides in framing material relationships for future humanities scholarship.

Maggie Ryan (margaret.ryan@du.edu), University of Denver, United States of America, Lindsay Gypin , University of Denver, United States of America, Krystyna K. Matusiak , University of Denver, United States of America, Benjamin M. Schmidt , New York University, United States of America and Peter Organisciak , University of Denver, United States of America

Theme: Lux by Bootswatch.