Fork me on GitHub

In-browser topic modeling

David Mimno

adapted by Mura Nava

Instructions:

When you open the page it will load a file containing documents and a file containing stopwords. The default is a corpus of Applied Linguistics abstracts from 2000-2016. Reference: Lei, L., & Liu, D. (2018). Research Trends in Applied Linguistics from 2005 to 2016: A Bibliometric Analysis and Its Implications. Applied Linguistics.

All words have initially been assigned randomly to topics. Click the "Run 50 iterations" button to start training. The iteration count will increase as the algorithm passes through the dataset multiple times.

The topics on the right side of the page should now look more interesting. Run more iterations if you would like -- there's probably still a lot of room for improvement after only 50 iterations.

Once you're satisfied with the model, you can click on a topic from the list on the right to sort documents in descending order by their use of that topic. Proportions are weighted so that longer documents will come first. You can also explore correlations between topics by clicking the "Topic Correlations" tab. This view shows a force directed layout with connections between topics that have correlations above a certain threshold.

The page works best in Chrome. Safari and Firefox work too, but may be considerably slower.