Integrating Historical Maps and Documents through Geocoding - Historical Big Data for the Japanese City of Edo

1. Abstract

1. Historical Big Data for the Urban Space

City of Edo has been the capital of Japan and is known to be one of the largest city in the world since 17th Century. To answer research questions on historical urban space, such as human activities and environmental effects, historical documents should be integrated by place, time, person and other entities to turn small facts into a collection of structured data for historical big data analysis [1]. Related work includes Pelagios [2], which studies historical gazetteers and georeferencing of old maps to reconstruct the geographic space, and European Time Machine [3], which aims at integrating historical entities to reconstruct the urban space in European cities. Our approach could also be called as Edo Time Machine.

2.  Integration of Historical Sources through Geocoding

Toponyms are described in many variations, especially on historical documents before the standardization of the address system. Hence a location-based historical database requires the shared address system, or the standard gazetteer, for toponym-based integration. Major challenges in toponym-based integration is variation and disambiguation of toponyms, and a question in this paper is how machine-based geocoding can deal with these challenges.

3. Dataset

  1. Edo Map Dataset: The dataset covers place names extracted from “Edo Kiriezu” (Owariya version) [4], a pre-modern map of Edo published from 1849 through 1863 in the form of 32 sheets. It contains not only addresses but also POIs (Point of Interests) such as bridges and temples [5].
  2. Edo Shopping Dataset: The dataset covers shops and restaurants extracted from “Edo Kaimono Hitori Annai” [6], a pre-modern shopping guide published in 1824 about 2600 shops and restaurants in Edo with the shop name, category, address and logo [7].

To create the dataset, we took advantage of IIIF (International Image Interoperability Framework), which allows interoperable image delivery in the humanities, and IIIF Curation Platform (ICP) [8,9,10], which is an open source software suite developed by our group to create the collection of a part of images across organizations. As a result, we created the dataset of 6418 place names from 22 sheets out of 32 sheets (Figure 1), and the dataset of 2454 shops from the whole book (Figure 2).

Figure 1: Edo Kiriezu, the sheet of Yotsuya area. Red markers show extracted place names (Total 335).

Figure 2: Edo Kaimono Hitori Annai. A search result for restaurants (Total 62).

4.  Experimental Results

Table 1 shows the result of matching between an entry in the gazetteer and a shop address (1034 unique addresses). In addition to exact match, we tested three other approaches; matching from the first character (forward match), matching from the last character (backward match), and matching a part of the address string (partial match). Table 1 shows that exact match was successful for about 21% (212/1034). Among the 212 successful cases, 49 addresses need disambiguation within a sheet and 15 needs disambiguation across sheets. Disambiguation within a sheet, however, is usually not a critical issue because, under the block-based, instead of street-based, Japanese address system, it usually means multiple neighboring blocks. Future work includes georeferencing coordinates between old maps and the present map, and analyzing relationship between the geographic distribution of businesses and human activities in the urban space.

Table 1: Matching 1034 unique addresses in the shopping guide against place names in the gazetteer. Note that some categories are not mutually exclusive.

References

[1]     Kitamoto, A., Ichino, M., Suzuki, C., Clanuwat, T (2018). Historical Big Data: Reconstructing the Past through the Integrated Analysis of Historical Data. Eighth Conference of Japanese Association for Digital Humanities (JADH2018). pp. 67-69.

[2]     Isaksen, L., Simon, R., Barker, E. T., & de Soto Cañamares, P. (2014). Pelagios and the emerging graph of ancient world data. In Proceedings of the 2014 ACM conference on Web science, pp. 197-201.

[3]     Time Machine Europe, https://www.timemachine.eu/, Accessed on June 15, 2020.

[4]     Owariya (1849-1862), Edo Kiriezu, National Diet Library, https://www.ndl.go.jp/landmarks/edo/, Accessed on June 15, 2020.

[5]     Center for Open Data in the Humanities, Edo Maps Beta, http://codh.rois.ac.jp/edo-maps/, Accessed on June 15, 2020.

[6]     Nakagawa Gorozaemon (1824). Edo Kaimono Hitori Annai, Dataset of Pre-modern Japanese Text (photographed by National Institute of Japanese Literature, archived by Ajinomoto Foundation For Dietary Culture), doi: 10.20730/100249503.

[7]     Suzuki, C., Curation of Merchant in Edo, http://www.ch-suzuki.com/edoshop/finder/?lang=en, Accessed on June 15, 2020.

[8]     Kitamoto, A., Homma, J., Saier, T. (2018) IIIF Curation Platform: Next Generation IIIF Open Platform Supporting User-Driven Image Sharing. Proceedings of IPSJ SIG Computers and the Humanities Symposium 2018. pp. 327-334 (in Japanese).

[9]     Kitamoto, A. (2019) IIIF Curation Platform: Creating and Sharing Virtual Image Collection on a Global Scale. 2019 International Conference: Glocal Humanities in the Era of Hyperconnectivity. 6 pages.

[10] Center for Open Data in the Humanities, IIIF Curation Platform, http://codh.rois.ac.jp/icp/, Accessed on June 15, 2020.

Asanobu Kitamoto (kitamoto@nii.ac.jp), ROIS-DS Center for Open Data in the Humanities, Japan, National Institute of Informatics, Shoko Terao , AMANE LLC., Misato Horii , AMANE LLC., Hiroshi Horii , AMANE LLC. and Chikahiko Suzuki , ROIS-DS Center for Open Data in the Humanities, Japan, National Institute of Informatics

Theme: Lux by Bootswatch.