"She Was Simply a Woman Who Was in Search of Variety" Supporting Complex Searches on The Willa Cather Archive with Orchid

1. Abstract

"She Was Simply a Woman Who Was In Search of Variety"

Supporting Complex Searches on The Willa Cather Archive with Orchid

Dussault, Jessica; Tunink, Greg; Dalziel, Karin

University of Nebraska-Lincoln

The Center for Digital Research in the Humanities (CDRH)1 is developing the open-source software, Orchid, to provide a browsing, searching, and viewing interface for documents stored in a center-wide API.2 Orchid was built not only to accommodate new projects at the CDRH, but also as a tool to allow rapid upgrading of older sites using TEI-XML3 documents as part of a mission to address the technical debt of dozens of aging sites. In late 2018, the first production site powered by Orchid, The Complete Letters of Willa Cather,4 was launched as a complement to the existing Willa Cather Archive (WCA) website.5 While The Complete Letters was now powered by Orchid, the rest of the WCA remained in Apache Cocoon,6 a framework it had used since the mid-2000s. In 2019, the CDRH challenged itself with migrating the entire WCA to an Orchid-powered application — a strong proof-of-concept that Orchid was equally suited to the task of hosting complex existing projects as well as new sites.

The first obstacle to migrating the WCA site was that Orchid had not been built for the complexity of sites such as the WCA. The site draws from overlapping sets of documents with differing displays and behavior, so the new application required both a site-wide search in addition to the existing Complete Letters search functionality. In order to address this challenge, Orchid logic was altered to support multiple sections of websites with customizable browsing options, search defaults, interfaces, and internationalization.7 This was engineered to re-use as much of the existing Orchid code and require as little additional configuration and code in the application as possible.

Figure 1: Screenshots of the site-wide WCA search (left) and The Complete Letters search (right). Differences include the ability to search annotations separately, different filters, a custom display, and a search by letter identifier.

This multi-section functionality is predicted to be utilized by complex sites slated for upgrade from older technologies, such as The Walt Whitman Archive.8 Perhaps most exciting of all, the multi-section functionality of Orchid brings us closer to a long-term goal of the CDRH to offer simple interfaces for engaging with older documents from dozens of projects available as sub-sites in a single, centralized application.

Footnotes

1 https://cdrh.unl.edu

2 Source code and documentation available at https://github.com/cdrh/orchid

3 The majority of the CDRH's projects involve text documents marked up in TEI-XML. For more information about this encoding method, see https://tei-c.org/

4 https://cather.unl.edu/letters

5 https://cather.unl.edu

6 https://cocoon.apache.org/

7 These pull requests on the Orchid GitHub repository handled the bulk of the modifications required to create multi-section support: https://github.com/CDRH/orchid/pull/133, https://github.com/CDRH/orchid/pull/143

8 https://whitmanarchive.org/ The Walt Whitman Archive is also powered primarily with Apache Cocoon and needs to be migrated to a newer, actively-supported technology

Sources

Apache Software Foundation (2013). "The Apache Cocoon Project." Apache Software Foundation. https://cocoon.apache.org/

Cather, Willa (1894 October 9). "Amusements." Nebraska State Journal. https://cather.unl.edu/writings/journalism/j00071

Center for Digital Research in the Humanities (2019). "Center for Digital Research in the Humanities." University of Nebraska-Lincoln. https://cdrh.unl.edu/

Center for Digital Research in the Humanities (2019). "Orchid." GitHub. https://github.com/cdrh/orchid

Dalziel, Karin; Dussault, Jessica; Tunink, Greg (2018). "Legacy No Longer: Designing Sustainable Systems for Website Development." DH2018. https://dh2018.adho.org/en/legacy-no-longer-designing-sustainable-systems-for-website-development/

Text Encoding Initiative (2019). "TEI: Text Encoding Initiative", Text Encoding Initiative. https://tei-c.org/

Walt Whitman Archive, The (2019). "The Walt Whitman Archive." Center for Digital Research in the Humanities. https://whitmanarchive.org/

Willa Cather Archive, The (2019). "The Complete Letters of Willa Cather." University of Nebraska-Lincoln. https://cather.unl.edu/letters/

Willa Cather Archive, The (2019). "The Willa Cather Archive." Center for Digital Research in the Humanities. https://cather.unl.edu/

Jessica Dussault (jdussault@unl.edu), University of Nebraska-Lincoln, United States of America, Tunink Greg , University of Nebraska-Lincoln, United States of America and Dalziel Karin , University of Nebraska-Lincoln, United States of America

Theme: Lux by Bootswatch.