Unsupervised Rumor Detection on Twitter using Topic Homogeneity

1. Abstract

Rumor detection on twitter is a highly-studied topic. Our prior work showed that it is possible to identify rumors (stories about events that have no basis in fact) on twitter by observing the relationship between the number of tweets about an event and the number of topics within those tweets (Shepard et al., 2019). In that paper, we presented case studies, but could not measure the difference between rumors and facts. This paper builds on that work by presenting a formula for making that distinction: given a data set of tweet texts with time stamps, our method automatically discovers rumors in those tweets. This paper presents our method, which is completely unsupervised in contrast to most recent work on rumor detection. As a case study, it demonstrates the method’s effectiveness on a sample dataset of tweets from the 3/11 earthquake in Japan.


At least seven new methods have been proposed for rumor detection on twitter in the past two years, including (Wang et al., 2017; Singh et al., 2017; Chen et al., 2017; Yoshida and Aritsugi, 2019; Ma et al., 2018; Poddar et al., 2018; Lin et al., 2019). One major constraint of these approaches, however, is that all of them require supervised learning. Our major intervention is that our approach is unsupervised and language-independent. Our algorithm makes only the assumption that the text has been tokenized into words.


Our approach results from the observation that little new information emerges about rumors. For example, if a chemical spill is announced on twitter, people often retweet the announcement. People are also likely to be affected in some way, such as seeing emergency vehicles or evacuating the area. New information will be created as the event progresses. If the spill is just a rumor, however, people will have no new information to add, and so will simply retweet the original rumor. We call the amount of information surrounding an event its “topic diversity.” Calculating an event’s topic diversity allows us to classify an event as a rumor or a fact. A true event will have both high tweet counts and high topic diversity, while a rumor will have high tweet counts but low topic diversity.

The first step in calculating topic diversity is separating our data into what we call word-window tweet lists, or WWTLs. These are lists of tweets that mention a word, split into 30-minute windows. We generate WWTLs for every word in our dataset of tweets, after removing stopwords. Next, we compute each WWTL’s topic diversity by performing topic modeling on each WWTL and counting the number of topics produced. We use a novel method of topic modeling that determines the number of topics automatically based on maximal clique enumeration (MCE). To prepare a TTWL for MCE, we first generate a graph of all tweets. Each node represents one tweet, and edges are generated by computing the Jaccard similarity of each pair of tweets based on shared vocabulary. Then, we perform data polishing (Uno et al., 2015; Uno et al., 2017) and MCE using nysol_python (nysol, 2019; Nakamoto and Hamuro, 2018) to produce a clique count for each WWTL. A WWTL with a large number of cliques has high topic diversity.

Finally, we examine whether increases in tweet counts in WWTLs correlate with topic diversity. For each word, we find the WWTL with the maximum tweet count (Tmax). Then, we find the minimum number of non-zero tweets prior to that maximum (Tmin), and call that period between those windows a “keyword rise.” For each keyword rise, we find the number of communities at the beginning and the end of the keyword rise (CTmin and CTmax). We then compute what we call the topic homogeneity factor (THF), using the following formula:

THF = ((T_max - T_min) / (T_min)) / ( (C_T_max - C_T_min ) / (C_T_min) )

We find that keywords with a THF of 25 or greater are likely to be rumors.


To validate our method, we performed an experiment on a data set of around 200 million tweets sent in the three weeks after the 3/11 earthquake in Japan, gathered by the social media company Hottolink (https://www.hottolink.co.jp/english). We knew a number of rumors had spread during the disaster, but were later corrected in official news sources.

Our test focused on identifying keywords related to two rumors. The first was about a chemical leak at the Cosmo Oil plant. The second was a widely-retweeted message purporting to be the last words of a dying system administrator, referred to as the “Geek House Asakusa” tweet.

To compare the patterns of these known rumor keywords against words not associated with rumors, we arbitrarily selected 150 other words that we expected to exhibit a diverse set of frequencies and topic diversity. We included users’ twitter handles, government agencies, and locations in the affected area. We expected that twitter handles would be mentioned infrequently, and have low topic diversity, while government agencies and affected areas would be mentioned frequently and have high topic diversity. Given the variety among these words’ frequency and topic diversity, we expected that many of them would have either word frequency or topic diversity measures that could confuse our method.


Using our method, it was easy to differentiate words related to rumors from other words. The term “???”(“Cosmo”) had a THF of 39, and “?????”1 (“Cosmo Oil”) had a THF of 73. Similar results were obtained for “???” (“geek”), with a THF of 41.73. Of the 150 non-rumor keywords, we found that with a threshold of 25.0, only one word was incorrectly classified as a rumor. Given this success, we considered our method to be effective.


We have shown that our method can effectively detect keywords likely to be associated with rumors. While we acknowledge that it was tested on a dataset from the past and not on a live tweet stream, in future work, we plan to experiment on a live tweet stream. We also plan to experiment with political rumors likely to generate discussion even if they are not true, which would produce higher topic diversity. Overall, though, we found our method to be effective at detecting rumors in our dataset and anticipate it to be effective at discovering rumors in similar dataset.

Works Cited

Chen, Y.-C., Liu, Z.-Y. and Kao, H.-Y. (2017). IKM at SemEval-2017 Task 8: Convolutional neural networks for stance detection and rumor verification. Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). pp. 465–469.

Lin, X., Liao, X., Xu, T., Pian, W. and Wong, K.-F. (2019). Rumor Detection with Hierarchical Recurrent Convolutional Neural Network. In Tang, J., Kan, M.-Y., Zhao, D., Li, S. and Zan, H. (eds), Natural Language Processing and Chinese Computing. (Lecture Notes in Computer Science). Springer International Publishing, pp. 338–48.

Ma, J., Gao, W. and Wong, K.-F. (2018). Rumor detection on twitter with tree-structured recursive neural networks. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 1980–1989.

Nakamoto, M. and Hamuro, Y. (2018). NYSOL: Preprocessing Tool for Big Data in Python. Fukuoka, Japan https://www.ieice.org/publications/conference-FIT-DVDs/FIT2018/data/pdf/B-015.pdf (accessed 12 October 2019).

nysol (2019). Nysol/Nysol_python. C https://github.com/nysol/nysol_python (accessed 12 October 2019).

Poddar, L., Hsu, W., Lee, M. L. and Subramaniyam, S. (2018). Predicting Stances in Twitter Conversations for Detecting Veracity of Rumors: A Neural Approach. 2018 IEEE 30th International Conference on Tools with Artificial Intelligence (ICTAI). IEEE, pp. 65–72.

Shepard, D., Hashimoto, T., Shin, K., Uno, T. and Kuboyama, T. (2019). The Narrow Scopes of Fake News: Detecting Fake News Using Topic Diversity Measures. Utrecht (accessed 12 October 2019).

Singh, V., Narayan, S., Akhtar, M. S., Ekbal, A. and Bhattacharyya, P. (2017). IITP at SemEval-2017 Task 8: A supervised approach for rumour evaluation. Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). pp. 497–501.

Uno, T., Maegawa, H., Nakahara, T., Hamuro, Y., Yoshinaka, R. and Tatsuta, M. (2015). Micro-Clustering: Finding Small Clusters in Large Diversity. ArXiv:1507.03067 [Cs] http://arxiv.org/abs/1507.03067 (accessed 13 October 2019).

Uno, T., Maegawa, H., Nakahara, T., Hamuro, Y., Yoshinaka, R. and Tatsuta, M. (2017). Micro-clustering by data polishing. 2017 IEEE International Conference on Big Data (Big Data). pp. 1012–18 doi:10.1109/BigData.2017.8258024.

Wang, F., Lan, M. and Wu, Y. (2017). ECNU at SemEval-2017 Task 8: Rumour evaluation using effective features and supervised ensemble models. Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). pp. 491–496.

Yoshida, Z. and Aritsugi, M. (2019). Rumor Detection in Twitter with Social Graph Structures. Third International Congress on Information and Communication Technology. Springer, pp. 589–598.

David Lawrence Shepard (shepard.david@gmail.com), UCLA, United States of America, Takako Hashimoto , Chiba University of Commerce, Japan, Kilho Shin , University of Hyogo, Japan, Takeaki Uno , National Institute of Informatics, Japan and Tetsuji Kuboyama , Gakushuin University, Japan

Theme: Lux by Bootswatch.