<?xml version="1.0" encoding="UTF-8"?><TEI xmlns="http://www.tei-c.org/ns/1.0"><teiHeader><fileDesc><titleStmt><title type="full"><title type="main">Inferring book relationships at the trillion-word scale</title><title type="sub"/></title></titleStmt><author><persName><surname>Organisciak</surname><forename>Peter</forename></persName><affiliation>University of Denver, United States of America</affiliation><email>peter.organisciak@du.edu</email></author><author><persName><surname>Schmidt</surname><forename>Benjamin M.</forename></persName><affiliation>New York University, United States of America</affiliation><email>bs145@nyu.edu</email></author><editionStmt><edition><date>43760</date></edition></editionStmt><publicationStmt><publisher>Name, Institution</publisher><address><addrLine>Street</addrLine><addrLine>City</addrLine><addrLine>Country</addrLine><addrLine>Name</addrLine></address></publicationStmt><sourceDesc><p>Converted from an OASIS Open Document</p></sourceDesc></fileDesc><encodingDesc><appInfo><application ident="DHCONVALIDATOR" version="1.22"><label>DHConvalidator</label></application></appInfo></encodingDesc><profileDesc><textClass><keywords scheme="ConfTool" n="category"><term>Paper</term></keywords><keywords scheme="ConfTool" n="subcategory"><term>Short Presentation</term></keywords><keywords scheme="ConfTool" n="keywords"><term>text mining</term><term>large-scale digital libraries</term></keywords><keywords scheme="ConfTool" n="topics"><term>Global</term><term>English</term><term>20th Century</term><term>Contemporary</term><term>cultural analytics</term><term>text mining and analysis</term><term>Library &amp; information science</term></keywords></textClass></profileDesc></teiHeader><text><body><p>Large digital libraries like the HathiTrust Digital Library (HTDL) provide texts of historical, cultural, or literary significance at unprecedented scales. However, the size and the consortial approach to building them can confuse computational attempts to model the collection, due to issues such as uneven duplication and incomplete metadata. This paper presents the technical workflow of a project seeking to address those challenges.</p></body></text></TEI>