Haiku Author Recognition

1. Abstract

Haiku Author Recognition

1. Introduction

Haiku is a Japanese poetic form renowned for its brevity and expressiveness. Haiku derives from renga/renku – collaborative collections of verses with a 3-line opening hokku verse in the form 5-7-5 on (equiv. syllable). Matsuo Basho made famous the stand-alone hokku form, preserving the 5-7-5 on structure. The name haiku was associated with this form of hokku during 19^th century.

Four haiku authors rise in prominence above all: Matsuo Basho (17^th century) is considered the “father” of haiku. Yosa Buson (18^th century) focused on haiku as an art rather than a reflection of reality. Buson combined hokku with painting, inventing haiga (verse-painting). Kobayashi Issa (18-19^th century) reinvented haiku through his depth of feeling and humanism. In the second half of the 19^th century, Masaoka Shiki critically re-evaluated the art of haiku (coining the term), braking away from the traditional 5-7-5 form, and popularizing the poetic style beyond Japan.

We present a study, which employs authorship attribution techniques to determine the distinctiveness of poetic styles in haiku, focusing on the poetry of Basho, Buson, Issa, and Shiki. There has been little work in the field of haiku attribution. A theoretical study of phonological complexity in haiku was presented in [1]. An approach to automatic evaluation of the quality of haiku was presented in [2]. An interesting work [3] deals with identifying unintended haiku in text. We approach haiku attribution as a classification problem: Given a set of attributed haikus, we train classifiers to recognize the writing style of each poet, and apply an ensemble of trained models to unattributed texts.

2. Our Haiku Corpus

The first step in creating our model was obtaining a haiku corpus. There are three approaches:

Use actual haikus written in hiragana (a form of Japanese alphabet)
Use Roman alphabet transcriptions (r?maji) of haikus.
Use English translations of haikus.

While using hiragana haikus is arguably the best option, our software lacks the capability to process hiragana text. English translations of haikus are readily available, but while research suggests that the authorial signal is stronger than the translators’ [4], we do not know if that applies to haiku. We opted to construct a corpus of r?maji transcribed haikus. This was difficult since most resources are either hiragana originals or translations. We obtained 723 haikus by Basho from [5], 842 haikus by Buson from [6], and 603 haikus by Issa from [7]. Finding transcribed Shiki haikus proved extremely challenging. Even though Shiki wrote over 24000 haikus, only a handful have been transcribed into r?maji. Failing to secure transcriptions, we downloaded the full set of 24000 hiragana haikus from [8]. We then used an online hiragana-to-r?maji transcription tool [9] to transcribe 967 randomly selected haikus by Shiki. Since many of the extracted haikus were organized alphabetically or by topic, we wrote Python code to randomly shuffle the order of the haikus for each author. A different program broke up the haikus into files of size 50 haikus each.

3. Attribution Methodology

Our attribution software is a based on JGAAP [10] and implements an ensemble of classifier/stylistic-feature pairs [11,12]. For this study, we limited the set of stylistic features to character-2/3/4/5-grams (CnG), word-2/3-grams (WnG), vowel-initiated words (VIW), and first-word-in-sentence (FWIS). The classifiers used were support vector machines with sequential minimal optimization (SMO) and multilayer perceptrons (MLP).

4. Results

We conducted several experiments, where we randomly chose one 50-haiku file for each author and removed it from the training set. We trained the classifiers on the remaining set of haikus using leave-one-out (L1O) validation. The results of the training for three sets of experiments are presented in Table 1:

Table 1: Training Accuracy for Basho, Buson, Issa, and Shiki

Next, we tested the authorship of the 50-haiku files that were left out of the training. The results of those experiments are presented in Table 2:

Table 2: Attribution Results for Basho, Buson, Issa, and Shiki

It is quite clear that even with a reduced set of stylistic features, the attribution is very strong and the author identification definitive. We conducted an additional set of experiments, where we used each of the trained models to test the authorship of five haikus by the 18^th century haiku poet Takarai Kikaku. The models were not trained on Kikaku, so, as expected, the results were split among two or more authors (Table 3):

Table 3: Attribution Results for Kikaku

Interestingly, Kikaku was a prominent student and disciple of Basho, yet none of the models makes that association. This is most likely due to the small number of Kikaku haikus tested.

5. Conclusion and Future Work

We presented results from haiku author identification experiments, which suggest that haiku authorship can be determined even with a limited set of stylistic features from r?maji-transcribed haikus. Our next efforts will be to experiment with a larger set of haiku authors, with English translations, and, possibly, with hiragana haikus. Among the questions we wish to answer are:

What is the minimal set of haikus sufficient to identify an author?
Is the authorial signal stronger than the translator’s for haiku translations?
Can prosodic features be used for haiku author identification?
Does the historical period affect the accuracy of attribution?

References:

[1] Hayata, K. (2018). “Phonological Complexity in the Japanese Short Poetry: Coexistence Between Nearest-Neighbor Correlations and Far-Reaching Anticorrelations”. Front. Phys. 6:31. doi: 10.3389/fphy.2018.00031

[2] Kikuchi, S. et al. (2016). “Quality estimation for Japanese Haiku poems using Neural Network”. IEEE Symposium Series on Computational Intelligence (SSCI), Athens, Greece, DOI: 10.1109/SSCI.2016.7850030

[3] Online resource: https://labs.ft.com/2016/07/finding-hidden-haiku/

[4] Hoover, D. (2019). “The Invisible Translator Revisited”. Digital Humanities Conference (DH2019). Utrecht, The Netherlands

[5] Barnhill, D.L. (2004). “Basho’s Haiku: Selected Poems of Matsuo Basho”. State University of New York Press, ISBN-13: 978-0791461662

[6] Persinger, A. (2013). "Foxfire: The Selected Poems of Yosa Buson, a Translation". Theses and Dissertations. Paper 748.

[7] Terebes, G. (2000). “A cup of tea with Isa”. Orpheusz Publishing House, Budapest, 2000, c. bilingual, corrected and expanded version of volume 2.

[8] Online resource: https://terebess.hu/english/haiku/shiki.html

[9] Online resource: http://www.romajidesu.com/translator

[10] Juola, P. 2009. JGAAP: A system for comparative evaluation of authorship attribution. Journal of the Chicago Colloquium on Digital Humanities and Computer Science, 1(1): 1-5.

[11] Petrovic, S., Berton, G., Campbell, S., Ivanov, L. 2015. Attribution of 18th Century Political Writings Using Machine Learning. Journal of Technologies in Society, volume 11, issue 3, pp. 1-13

[12] Petrovic, S., Berton, G., Schiaffino, R., Ivanov, L. 2016. Examining the Thomas Paine Corpus: Automated Computer Author Attribution Methodology Applied to Thomas Paine’s Writings. Chapter, New Directions in Thomas Paine Studies, Edition: 1, Publisher: Palgrave Macmillan US, Editors: S. Cleary I. Stabell, DOI: 10.1057/9781137589996