Topic modelling of Finnish Internet discussion forums as a tool for trend identification and marketing applications

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Perustieteiden korkeakoulu | Master's thesis

Date

2019-03-12

Department

Major/Subject

Systems and Operations Research

Mcode

SCI3055

Degree programme

Master’s Programme in Mathematics and Operations Research

Language

en

Pages

78+35

Series

Abstract

The increasing availability of public discussion text data on the Internet motivates to study methods to identify current themes and trends. Being able to extract and summarize relevant information from public data in real time gives rise to competitive advantage and applications in the marketing actions of a company. This thesis presents a method of topic modelling and trend identification to extract information from Finnish Internet discussion forums. The development of text analytics, and especially topic modelling techniques, is reviewed and suitable methods are identified from the literature. The Latent Dirichlet Allocation topic model and the Dynamic Topic Model are applied in finding underlying topics from the Internet discussion forum data. The discussion data collection with web scarping and text data preprocessing methods are presented. Trends are identified with a method derived from outlier detection. Real world events, such as the news about Finnish army vegetarian meal day and the Helsinki summit of presidents Trump and Putin, were identified in an unsupervised manner. Applications for marketing are considered, e.g. automatic search engine advert keyword generation and website content recommendation. Future prospects for further improving the developed topical trend identification method are proposed. This includes the use of more complex topic models, extensive framework for tuning trend identification parameters and studying the use of more domain specific text data sources such as blogs, social media feeds or customer feedback.

Description

Supervisor

Ilmonen, Pauliina

Thesis advisor

Koski, Mikko

Keywords

topic modelling, social media, natural language processing, text analytics, trend identification, digital marketing

Other note

Citation