Historical Newspaper Content Mining Revisiting the impresso Project’s Challenges in Text and Image Processing, Design and Historical Scholarship

1. Abstract

impresso. Media Monitoring of the Past is an interdisciplinary research project in which a team of computational linguists, designers and historians collaborate on the datafication of a multilingual corpus of digitised historical newspapers. The primary goals of the project are to improve text mining tools for historical text, to enrich historical newspapers with (semi-) automatically generated data and to integrate such data into historical research workflows by means of a newly developed user interface. In this paper we discuss our efforts to overcome inherent challenges and to integrate text mining and data visualisation applications in general historical research practices which are characterised by search operations as well as the need to create topical collections.

Maud Ehrmann (maud.ehrmann@epfl.ch), École polytechnique fédérale de Lausanne (EPFL), Switzerland, Estelle Bunout , Luxembourg Centre for Contemporary and Digital History (C2DH), Luxembourg, Simon Clematide , Universität Zürich, Switzerland, Marten Düring , Luxembourg Centre for Contemporary and Digital History (C2DH), Luxembourg, Andreas Fickers , Luxembourg Centre for Contemporary and Digital History (C2DH), Luxembourg, Roman Kalyakin , Luxembourg Centre for Contemporary and Digital History (C2DH), Luxembourg, Frédéric Kaplan , École polytechnique fédérale de Lausanne (EPFL), Switzerland, Matteo Romanello , École polytechnique fédérale de Lausanne (EPFL), Switzerland, Paul Schroeder , Luxembourg Centre for Contemporary and Digital History (C2DH), Luxembourg, Philipp Ströbel , Universität Zürich, Switzerland, Thijs van Beek , Luxembourg Centre for Contemporary and Digital History (C2DH), Luxembourg, Martin Volk , Universität Zürich, Switzerland and Lars Wieneke , Luxembourg Centre for Contemporary and Digital History (C2DH), Luxembourg

Theme: Lux by Bootswatch.