Large-scale digital libraries, such as the HathiTrust Digital Library (HTDL) and the Internet Archive have emerged consortially, collecting works from institutions around the world. This has led to unevenly biased duplication: some works recur many times in the collections, while others may only have one copy. The Massive Text Lab at the University of Denver is researching levels of ‘sameness’ and duplication of works within these digital libraries through massive-scale analysis. We will discuss applications to modern cataloging standards and provide an overview of the issue and intricacies of duplication, the solutions the project is pursuing, and the value that our work provides in framing material relationships for future humanities scholarship.