Beside and Beyond Visualising the Paradata and Metadata of Digitised Historical Newspapers with SKOS and LOD

1. Abstract

Over the past thirty years, national libraries, universities and commercial publishers around the world have preserved and made available hundreds of millions of pages of historical newspapers through mass digitisation, currently releasing over one million new pages per month. These have become vital resources not only for humanities researchers but also for journalists, politicians, schools, and the general public.[1] However, research conducted by the multinational Oceanic Exchanges: Tracing Global Information Networks in Historical Newspaper Repositories, 1840-1914 project has shown that the collections created by digitisation programmes have not always accurately represented the cultural histories of these publications and their producers, especially the relationships with peoples and institutions in other regions;[2] the very creation of national collections obscures the reality that global news exchanges were central to the nineteenth-century press.[3] The role and repercussions of the individual is equally seen in the digitisation process. On the one hand, individuals from a variety of backgrounds, and with a diverse range of remits, create internal policies and processes that shape how newspapers are stored digitally. One the other, individual end users, with different interests and competencies, must work across diverse collections in order to build as full a picture as possible of global information networks, leading to undocumented irregularities and inconsistencies in their research samples.

This poster will explore our efforts to bridge the interoperability gap between those creating the authoritative, standardised metadata for these collections and the end users attempting to create historical and other narratives through the use of these materials. It has two objectives:

  1. To clarify the decision-making processes behind the structure and creation of digital archives, specifically selection and metadata;
  2. To offer historically informed guidelines and models for archives and end-users, to enhance the quality of engagement and development going forward.

Our work aims to enable researchers from a variety of disciplinary backgrounds to break through the barriers between siloed collections as well as provide historically-informed principles for archivists and digitisers to consider when implementing their metadata standards.[4] The poster will present visualisations of existing metadata standards for digitised newspapers mapped to researcher-focused ontologies, based on supplementary information and context (the paratext)[5] and developed by the Oceanic Exchanges project using the Simple Knowledge Organization System (SKOS). It will demonstrate how the structures of digital and physical newspaper archives can be explored and manipulated to better inform humanities research questions using semantic web technologies.[6] These visualisations will link to a further resource developed by the Oceanic Exchanges project: The Atlas of Digitised Newspapers, which provides a comparative history of key digitised newspaper collections, including their selection processes, digital infrastructures (including Xpaths and technical definitions) and licensing requirements. These elements—the ontology visualisations and the Atlas —will provide a thoroughly annotated controlled vocabulary designed to be used across disciplines, inside and outside the academy, and provide the basis for further digital humanities research using digitised historical collections.[7]

[1] Paul Gooding, Historic Newspapers in the Digital Age: "Search All About It!" (Basingstoke: Palgrave, 2017): 172; James Mussell, The Nineteenth-Century Press in the Digital Age (Basingstoke: Palgrave, 2012): 1.

[2] In conjunction with partners at institutions including Northeastern University, Universität Stuttgart, Universidad Nacional Autónoma de México, Universiteit Utrecht, Turun Yliopisto and University College London; see

[3] M. H. Beals, "Transnational Exchanges" in David Finkelstein, ed., The Edinburgh History of the British and Irish Press, vol. 2 (Edinburgh: Edinburgh University Press, 2020): 240–60.

[4] Based on ongoing work with Chronicling America, National Digital Newspaper Archive of Mexico, British Library 19th Century Newspapers, Times Digital Archive, Delpher, Europeana, National Library of Finland, Trove and Papers Past.

[5] See Gérard Genette, Paratexts: Thresholds of Interpretation (Cambridge: Cambridge University Press, 2009), and Drew Baker, Anna Bentkowska-Kafel, and Hugh Denard eds. Paradata and Transparency in Virtual Heritage (Farnham: Ashgate, 2012).

[6] METS/ALTO is the most common choice; ALTO (see was developed by the METAe to be used alongside METS, and provides layout information. METS provides the structure of the newspaper (see Other structures include MPEG-21/DIDL (see ISO/IEC, "Information Technology – Multimedia framework [MPEG-21] – Part 2: Digital Item Declaration", ISO/IEC 21000-2:2005 [October 2005]), a METS/MODS combination (see Morgan Cundiff, "Library of Congress: Using and to Create XML Standards-based Digital Library Applications" [2007], METS/PREMIS (see, Jisc’s own adapted Dublin Core (see, Gale legacy XML, the XML used by Gale on their text-mining drives, JSON (see, and two different APIs (Trove and Digital New Zealand).

[7] See Patricia Harpring, Introduction to Controlled Vocabularies: Terminology for Art, Architecture, and Other Cultural Works Online Edition (Los Angeles: Getty Publications, 2010):

Emily Bell (, Loughborough University, United Kingdom and M. H. Beals , Loughborough University, United Kingdom

