A business news event detection algorithm with an application to the forest industry

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

School of Business | Master's thesis

Date

2021

Major/Subject

Mcode

Degree programme

Information and Service Management (ISM)

Language

en

Pages

83+2

Series

Abstract

The forest industry is an important industry that generates billions of euros and employs millions of workers. However, it lacks a particular type of business intelligence enjoyed by other industries, namely the extraction of knowledge from online articles. Despite many studies on this subject, no relevant study exists for the forestry industry due to the lack of a usable dataset. This thesis proposes an event detection algorithm for online articles that can be applied to both general business news and forest industry news. To that end, three research questions are examined. Firstly, the creation of a robust dataset that is inclusive of forest industry news. Secondly, establishing the feasibility of building an event detection algorithm to recognize and classify both general business and forest industry news. Lastly, proposing an optimally performing model for the said algorithm. To build an event detection algorithm, machine learning methods, particularly natural language processing, are used. The proposed solution comprises contextualized word embeddings and a classification model. Those word embeddings are created with BERT, a state-of-the-art model for text handling from Google. For model performance tuning, one approach is implemented to address the class imbalance problem. The evaluation shows that the proposed solution delivers a strong result, which indicates promising practical implementations in the forest industry. Companies in the industry should be potentially able to enjoy an aspect of business intelligence that has been employed in other industries. This thesis is the first to empirically examine the links between online news articles, events detection, and the forest industry. The thesis’s contributions are twofold. First, the thesis provides an annotated dataset for use with different machine learning methods. Secondly, it complements literature on the feasibility of an event detection algorithm applicable to both business and forestry industry news.

Description

Thesis advisor

Malo, Pekka

Keywords

forest industry news, business news, natural language processing, machine learning, classification, event detection

Other note

Citation