Analysis and Categorisation of Research Software in the Digital Humanities

1. Abstract

Abstract

In the Digital Humanities (DH) research software is a major output of the scientific process besides primary research data. However, long-term sustainability and preservation of living systems, in contrast to long-term archiving (LTA) of research data, is still a non-trivial, often institutionally segmented enterprise (Smithies et al., 2019). To counter this, the SustainLife project, running at the Institute of Architecture of Application Systems (IAAS, University of Stuttgart) and the Data Center for the Humanities (DCH, University of Cologne), uses the Topology and Orchestration Specification for Cloud Applications (TOSCA) standard (OASIS, 2013 and 2019), and the open source ecosystem OpenTOSCA (Breitenbücher et al. 2016). To realize our vision of a sustainable long-term DH research software archive, we are building a repository of software components modeled in TOSCA, i.e., Node Types, and use them to create application blueprints, i.e., Topology Templates in TOSCA, describing the structure of an application.

To identify the most used software component types within the DH, we approach the vast field of DH research software from multiple perspectives (Neuefeind et al., 2018 and 2019): (1) We investigate multiple use cases in depth and extract the components used in the respective applications, starting from the operating system (OS) level up to the user interface (UI) layer. These components are then modelled as reusable Node Types in TOSCA which can be utilized to describe the respective application in a TOSCA Topology Template. (2) Qualitative case studies aim at extracting all employed components and technologies of selected applications and (3) quantitative surveys, targeted at DH research software experts (i.e. researchers), are performed to collect community practices and demands. The results from both the case studies and the surveys are then used to identify key components and derive application stacks, i.e., Node Types and Topology Templates.

Our poster presents a “Categorisation of Research Software in the Digital Humanities” to spawn discourse about the perspectives of DH research software in contrast to research data (Sahle and Kronenwett, 2013) and how this vast landscape can be indexed in a way that researchers may find the most appropriate set of component types or a suitable application blueprint, when searching for their own software. In this context, we investigate the use of categorial keywords which, in combination with technological identifiers, e.g., programming languages and database types, can be used to provide a means for categorizing research software in the DH. Therefore, our poster will present selected DH research software as a basis for discussion. Hereby, we introduce component types and application stacks from our repository, e.g., an Apache web server or the application stack of a DH research software respectively. Moreover, a set of technological identifiers and categorial keywords, such as digital editions, virtual research environments, and interactive visualisations (Wuttke et al., 2016), associated with these projects will be presented.

Acknowledgements

This Poster is partially funded by the DFG-LIS project “SustainLife” (GEPRIS 379522012).

References

Lukas Harzenetter (lukas.harzenetter@iaas.uni-stuttgart.de), Institute of Architecture of Application Systems, University of Stuttgart, Germany, Johanna Barzen , Institute of Architecture of Application Systems, University of Stuttgart, Germany, Frank Leymann , Institute of Architecture of Application Systems, University of Stuttgart, Germany, Brigitte Mathiak , Data Center for the Humanities, University of Cologne, Germany, Philip Schildkamp , Data Center for the Humanities, University of Cologne, Germany, Claes Neuefeind , Cologne Center for eHumanities, University of Cologne, Germany and Uwe Breitenbücher , University of Stuttgart, Germany

Theme: Lux by Bootswatch.