Stopifu Supporting Task-Specific Stoplisting for Topic Models

1. Abstract

Probabilistic topic modeling is a promising and increasingly popular method of text analysis, affording the identification of patterns of change within tremendously large corpora of documents. However, though tools for exploring and analyzing topic models are increasingly common, the process of building a topic model remains something of an art, given the challenges of pre-processing and model training. To help make one stage of pre-processing more transparent, we have created Stopifu, a web-based tool designed to give researchers more direct control of stopword removal and help them anticipate the effect that excluding different words will have on their analysis. We present our design for this tool, along with our categorization of different types of stopwords that motivated its design.

Malcolm Mitchell (mitchellm@carleton.edu), Carleton College, United States of America and Eric Carlson Alexander , Carleton College, United States of America

Theme: Lux by Bootswatch.