The HyperThesau project was initiated by a multidisciplinary team consisting of two research laboratories of archaeology and computer science, a digital library, two archeological museums and a private company. This project has two main objectives: 1) the design and implementation of an integrated platform to host, search, share and analyze archaeological data; 2) the design of a domain-specific thesaurus taking the whole archaeological data lifecycle into account.
Archeological data may bear many different types (documents, photos, drawings, sensor data, ...). The description of an archaeological object also differs with respect to users, usages and time. Such variety of archeological data induces many scientific challenges related to storing heterogeneous data in a centralized repository, guaranteeing data quality, cleaning and transforming the data to make them interoperable, finding and accessing data efficiently and cross-analyzing the data. To overcome all these challenges, we exploit the concept of data lake