Datasets of Criminal Faces Within and Under Facial Recognition Software (FRS) From a Digital Humanities Perspective

1. Abstract

This paper responds to the intersectional problematics of facial databases within contemporary facial recognition software and computer vision machine learning by highlight our ongoing project This Criminal Does Not Exist. Beginning with the MEDS database, our project applies a Convolutional Generative Adversarial Network to produce synthetic faces. Aesthetically, the portraits generated resemble eugencist Francis Galton’s “composite portraits” of different races that he deployed in the 19th century. The project is a data visualization project: using machine learning techniques, we have been able to surface what is the “most common” type of face within the dataset; that the portraits generated are primarily of African American males speaks to the types of faces over-represented in these virtual spaces. Further, from this data visualization, “This Criminal Does not Exist” is indicative of contemporary State applications of FRS, bringing to light the clear biases inherent in the dataset, biases further perpetuated through algorithms trained on these types of dataset.

This response is made from a digital humanities perspective that combines principles of ethical data annotation and classification with critical making. In particular, this paper addresses how digital humanities can contribute potential solutions to the ethics of studying and surfacing problematic databases.More specifically, drawing from the the impacts of 19th century pseudo-science like eugenics, phrenology, physiognomy, and signaletics, our project “This Criminal Does Not Exist” signals another potential set of tactics and research creation paths that simultaneously educates the public about the nature of problematic facial datasets, alongside producing arguments about the ethical implications about such databases and their in-built classification practices. Further, this paper explores how digital humanities scholars can provide a public critical engagement with such databases that is grounded in humanized narrative, that does not further replicate and/or ingrain the intersectional and carceral biases of the databases.

Our research begins by recounting how the contemporary study and fears surrounding FRS has been largely focused on large scale corporate- and state-led surveillance apparatuses and their impacts on users’ data privacy. This work, exemplified by scholars like Ann Cavoukian and her framework of Privacy by Design, is undeniably useful; similarly, research by surveillance studies theorists like David Lyon and Gary Marx has contributed greatly to advocating for responsible building and application of technologies like FRS. The initial scholarship into the problematic construction of FRS has been driven, in large part, by a wealth of research and reporting about the known inherent biases of the technology, which, as the Georgetown Law Center on Privacy & Technology’s report “The Perpetual Line-up” insists, “face recognition may be least accurate for those it is most likely to affect: African Americans.” The technology’s consistent optimization, in construction and application, for white male faces is especially troubling as the technology moves from being surveilling, national security, and law enforcement tactics, into the ubiquitous, and far more normalized, activities of intervening in job interviews, the monitoring of low incoming housing, and the granting of bank loans. These last three FRS tasks are examples of what Safiya Noble, in her text Algorithms of Oppression, would give as examples of “technological redlining,” which she explains is the use of algorithms and big data to “reinforce oppressive social relationships and enact new modes of racial profiling.”  

Given this, how might digital humanities scholars make the contents of these databases public and available for wider scrutiny and potential regulation while not replicating the dangerous practices that initially led to the construction and implementation of such data? One effective example is artist Trevor Paglen’s collaboration with scholar Kate Crawford titled ImageNet Roulette. The project trains an app on the massive ImageNet database’s of images labeled in the “person” category. The result is a surfacing of how “ImageNet contains a number of problematic, offensive, and bizarre categories. Hence, the results ImageNet Roulette returns often draw upon those categories. That is by design: we want to shed light on what happens when technical systems are trained using problematic training data.” Their accompanying essay, “Excavating AI: The Politics of Images in Machine Learning Training Sets,” expands further in labelling their own work as an “archeology of datasets”: “we have been digging through the material layers, cataloguing the principles and values by which something was constructed, and analyzing what normative patterns of life were assumed, supported, and reproduced. By excavating the construction of these training sets and their underlying structures, many unquestioned assumptions are revealed.” Digital humanities scholars are extremely well suited to take up similar archeological projects, in FRS or other AI- and machine learning-aided environments, as the discipline’s focus on ethics, digital tools and humanities-based close-reading techniques grant scholars the abilities to take up the urgent problems of FRS’s everyday applications.

Works Cited

Buolamwini, Joy. “Incoding - In the Beginning” Medium. May 16 2016. para. 6. Accessed 11 June 2020.

Buolamwini, Joy and Timnit Gebru. “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification” Conference on Fairness, Accountability, and Transparency. Proceedings of Machine Learning Research 81:1–15, 2018..

Cavoukian, Ann. “The Seven Foundational Principles” 2017. Accessed 11 June 2020.

Crawford, Kate and Trevor Paglen. “Excavating AI: The Politics of Images in Machine Learning Training Sets.” September 19, 2019. para. 37. Accessed 11 June 2020.

Galton. Francis. “Composite Portraits.” Journal of the Anthropological Institute. 1870. 132-144; Hereditary Genius: An Inquiry into its Laws and Consequences. Macmillan and Co, 1869.

Garvie, Clare, Alvaro M. Bedoya and Jonathan Frankle. “The Perpetual Line-Up” Georgetown Law Centre on Privacy and Technology. https// 2016, para. 23. Accessed 11 June 2020.

Lyon, David. Surveillance as Social Sorting: Privacy, Risk, and Digital Discrimination. Routledge, 2003; Identifying Citizens: ID Cards as Surveillance. Polity, 2009.

Marx, Gary. Windows into the Soul: Surveillance and Society in an Age of High Technology. University of Chicago Press, 2016.

Noble, Safiya U. Algorithms of Oppression: How Search Engines Reinforce Racism. New York University Press, 2018.

Aaron Tucker (, York University and Ryerson University, Canada and Kieran Ramnarine (, Ryerson University, Canada

Theme: Lux by Bootswatch.