Deep Learning based Attribute Representation in Ancient Vase Paintings

1. Abstract

The understanding of iconography and visual narration in ancient imagery is one of the main foci in the field of Classical Archaeology, e.g. in Attic vase paintings of the fifth century B.C. In order to depict the situations and actions of a narrative as well as to characterise its protagonists, ancient Greek artists made use of a broad variety of often similar image elements [1]. The interaction and meaningful relationship of the protagonists is depicted with significant postures and gestures (schemata) in order to illustrate key aspects of the storyline [2, 3]. These schemes are not restricted to a certain iconography, so that visual links between different images occur. Being familiar with these relationships the ancient viewer could detect the specific narration and understand the meaning of the image. For example, the scheme of leading the bride in Attic vase paintings is characterised by a significant leading-gesture (χεῖρ’ ἐπὶ καρπῷ – hand on wrist / hand on hand) that relates bride and bridegroom in non-mythological wedding scenes, thereby expressing a hierarchy of the two figures in active or passive parts respectively. Both protagonists are connected in a communicative way defined by meaningful postures and gestures.
Besides these descriptive schemes there are narrative elements, like the surrounding, specific attributes of gods or other objects, that contribute to the viewer’s understanding of narrative contexts. These other characteristic attributes are needed in order to determine the depiction as a specific mythological one. Scenes of Helen and Menelaos (dressed as warrior and threatening her with weapons), thus, make use of the leading the bride scheme to show that Menelaos leads Helen home from Troy, like a bride. Only by combining the formal schemes and narrative elements is it possible to determine the depiction as a certain mythological story and to understand its cultural meaning.
We, therefore, need to recognize the narrative elements of the pictorial stories and to decode their formal key structures by comparing a high number of ancient images and by discovering visual similarities. It is necessary to target the understanding of complex semantic relationships and cultural references in a mixed-initiative-interaction between researcher and machine, and involve computer vision as a fourth Bildwissenschaft [4].
We approach this problem using state-of-the-art object recognition algorithms [5, 6] in computer vision. We present a novel approach using convolutional neural networks which maps the attributes occurring in Classical Archaeology into a deep representational space, through deep learning techniques. This representation maps the inherent variations in the representation of any particular attribute into a common space. A further novel method is developed that uses this representation space along with the context of the narrative for a better understanding of the depicted scene in the artwork. We also analyse the model performance with abstract classes where we combine multiple similar attributes into a single abstract class. This new approach has important applications, including the retrieval and contextual understanding of artworks [7] in Classical Archaeology.

References:

[1] Giuliani, L., 2003. Bild und Mythos. Geschichte der Bilderzählung in der griechischen Kunst, München: Beck 2003.
[2] McNiven, T. J., 1982. Gestures in Attic Vase Painting: Use and Meaning, 550–450 B.C., Diss. Ann Arbor: University of Michigan.
[3] Catoni, M. L., 2008. La comunicazione non verbale nella Grecia antica : gli schemata nella danza, nell'arte, nella vita, Turin: Bollati Boringhieri.
[4] Panofsky, E., 2006. Ikonographie und Ikonologie: Bildinterpretation nach dem Dreistufenmodell, Cologne: DuMont.
[5] Lin, T.Y., Goyal, P., Girshick, R., He, K. and Dollár, P., 2017. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision (pp. 2980-2988).
[6] Redmon, J., Divvala, S., Girshick, R. and Farhadi, A., 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 779-788).
[7] Garcia, N., & Vogiatzis, G. (2019). How to read paintings: Semantic art understanding with multi-modal retrieval. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11130 LNCS, 676–691. https://doi.org/10.1007/978-3-030-11012-3_52