A comparative study of sentiment and topics in migration related tweets

1. Abstract

The International Organization for Migration (IOM) recently reported that the world has more migrants than ever before “both numerically and proportionally” and that the number of environmental migrants alone could reach 1 billion by 2050 (2017, p. 2; 2014). The IOM has also argued that understanding how people think and feel about migration is essential to the development of policy that supports the safe passage and successful integration of migrants into their new communities (2017). The United States receives more migrants per year than any other country (United Nations, 2016). It is also a country where migration is a prominent topic in social, political and media discourse. This poster reports the findings of an exploratory study whose principal research questions are: 1) What do Twitter users in the United States talk about in their migrant- and migration-related tweets, 2) how do authors feel when they tweet about these subjects, and 3) how do their sentiments and topics of interest compare to Twitter users residing outside the United States? To approach these questions, we apply sentiment analysis, topic modeling and time series analysis to 111,785 English language tweets containing the term “migrant” or “migration” collected from the Twitter API between January 21 and March 4, 2019. Our quantitative and computational methodology relied on descriptive statistics, the Python programming language, and associated libraries, such as the Natural Language Toolkit (NLTK) for text processing, Pandas for data manipulation, VADER for sentiment analysis, Gensim for topic modeling and Seaborn for data visualization. The visualized data shows that authors in the United States focused more on politics whereas authors in other countries focused more on humanitarian issues. Tweets from U.S. authors were more negative than those authored by residents of other countries, except when the topics of the tweets involved children. A time series plot revealed three sentiment spikes, one positive and two negative, over the course of the collection period. The negative spikes were partially explained by looking at word frequencies, visualized as word clouds, alongside news headlines for the dates in question. The positive spike was more difficult to explain because of the limitations of sentiment analysis and the negative nature of news reporting. Our findings cannot be generalized because the Twitter Search API does not generate a representative sample (González-Bailón, 2014). Future work could involve an improved sampling method that would permit the use of inferential statistical methods. The addition of multiple languages to the tweet dataset or social network data to the analysis may yield interesting new findings.

References

González-Bailón, S., Wang, N., Rivero, A., Borge-Holthoefer, J., & Moreno, Y. (2014). Assessing the bias in samples of large online networks. Social Networks, 38, 16-27. https://doi.org/10.1016/j.socnet.2014.01.004

International Organization for Migration. (2014). IOM Outlook on Migration, Environment and Climate Change. https://environmentalmigration.iom.int/iom-outlook- migration-environment-and-climate-change-1

International Organization for Migration. (2017). World Migration Report 2018. https://www.iom.int/wmr/world-migration-report-2018

United Nations, Department of Economic and Social Affairs, Population Division (2016). International Migration Report 2015: Highlights (ST/ESA/SER.A/375). https://www.un.org/en/development/desa/population/migration/publications/migrationrep ort/docs/MigrationReport2015_Highlights.pdf