Scalable stance detection with automated topic discovery

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Perustieteiden korkeakoulu | Master's thesis

Department

Major/Subject

Mcode

SCI3115

Language

en

Pages

49

Series

Abstract

Given the vast amounts of data available and the breadth of opinions expressed within it, there is a need for automated analysis. Such analysis can be used to understand customer opinions, public support for political initiatives, or find bias in news coverage. Current systems that try to understand the opinions within text are often limited to sentiment analysis, classifying a text's tone as being positive or negative. Stance detection provides a more powerful solution that classifies a text's opinion towards a specific statement, which allows for more in-depth insights into subjects of interest. However, stance detection has remained an often overlooked field and suffers from several issues that hinder its use. Firstly, many systems cannot generalise well beyond their training data, limiting their use for broader datasets. Secondly, the statements towards which the opinion is directed can be hard to determine when no manually created list exists. When working with large amounts of text content, one may not know what topics it contains nor even how many topics there are. To solve these shortcomings, we present a system that is able to obtain more generalizable text understanding and automatically discover the main topics in a dataset of texts. Additionally, there will be a focus on computational performance and experimentation with uncertainty quantification. This system consists of two main parts that can be used independently. The first gives the relevant topics based on raw text, while the second extracts the stance of text towards a list of provided topics. We introduce a new metric to validate the performance of topic discovery systems that cluster documents and name them. It allows for the use of labelled data without exact label matching. For the scoring of the stance detection system, SemEval 2016 Task 6A shall be used to compare it to other state-of-the-art systems, including GPT. Results for both the topic and stance detection system are promising, with our stance detection scoring up to 77.2% ± .9 on SemEval compared to a score of 72% as the second best published result.

Description

Supervisor

Garg, Vikas

Thesis advisor

Kihlbaum, Jacob

Other note

Citation