Text Mining and Comparative Visual Analytics on Large Collection of Speeches to Trace Socio-Political Issues

Published in 2019 IEEE 9th International Advance Computing Conference (IACC 2019), 2019

Received the Lead Paper Award for the Session 6 (NLP).

View paper here

Abstract

We present an experimental study of implementing Latent Dirichlet Allocation (LDA) and Comparative Visual Analytics to trace socio-political issues highlighted within large corpora of political speech transcripts. In this experiment, over 500 speech transcripts are scraped by building scrapers to analyze this big-data of transcripts and derive insights from it. Based on LDA topic modelling algorithm, latent “topics”, referred as issues in this paper, were discovered from the speech transcripts and visualized using `pyLDAvis', which is an interactive visualization tool used upon LDA Model results. Along with LDA, graphical visualizations were generated such as Lexical Dispersion Plots and Topic Bar Plots using Matplotlib library of Python. Within comparative analytics, visual graphs were generated for speeches by two different candidates and juxtaposed to compare and interpret their discourse. Linguists have performed Political Discourse Analysis (PDA) using manual approaches but analyzing such a large volume of speeches is practically time consuming and extremely complex. Our experiment which focuses on identifying socio-political issues within speech transcripts using NLP based text analytics proves to be a beneficial technique for understanding Political Discourse Analysis (PDA).