Leveraging Google Sheets and GitHub for Data Curation on the Princeton Ethiopian Miracles of Mary Project

1. Abstract

The creation and curation of humanities datasets is an important scholarly activity that requires labor and expertise, and results in a research output that furthers scholarship (Elswit and Bench). As more scholars become interested in building datasets, we need better and simpler solutions for managing and publishing data. Many humanities datasets make sense as tabular or relational data, but not every scholar or project team has the skills, resources, or desire to create and manage a relational database.

As a possible solution to the gap between researchers' skills and technical requirements, we will demonstrate the tools we are prototyping to support data curation in The Princeton Ethiopian Miracles of Mary Project, adding lightweight infrastructure around Google Sheets and GitHub using generalizable scripts. Our approach is to model and structure the data as if implementing it in a relational database, but with the goal of creating a set of related sheets in a single Google Sheets spreadsheet with data validation to link them and avoid redundant data entry (Belcher et al.). We will show a Google Apps Script project that can create a new spreadsheet with configured sheets, fields, and data validation based on a JSON data structure. We will also demo a script that generates a regular, automatic export of the Google Sheets data and commits it to a GitHub repository, resulting a versioned copy available for querying, visualization, automated validation, interface prototyping and publication, leveraging static site technology and minimal computing principles, and eventual data deposit for publication.

Works Cited

Belcher, Wendy Laura, Rebecca Sutton Koeser, Rebecca Munson, Gissoo Doroudian, and Meredith Martin. “CDH Project Charter – Princeton Ethiopian Miracles of Mary 2019-20.” Center for Digital Humanities at Princeton, August 2, 2019. https://doi.org/10.5281/zenodo.3359178.

Elswit, Kate, and Harmony Bench. “Datasets Are Research.” Dunham’s Data, September 5, 2019. https://www.dunhamsdata.org/blog/datasets-are-research.

Princeton-CDH/pemm-scripts. Python. Center for Digital Humanities at Princeton, 2020. https://github.com/Princeton-CDH/pemm-scripts.

Princeton-CDH/pemm-data. Center for Digital Humanities at Princeton, 2020. https://github.com/Princeton-CDH/pemm-data.

Rebecca Sutton Koeser (rebecca.s.koeser@princeton.edu), Princeton University, United States of America, Nick Budak (nbudak@princeton.edu), Princeton University, United States of America and Rebecca Munson (rmunson@princeton.edu), Princeton University, United States of America

Theme: Lux by Bootswatch.