A business news event detection algorithm with an application to the forest industry
School of Business | Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Information and Service Management (ISM)
AbstractThe forest industry is an important industry that generates billions of euros and employs millions of workers. However, it lacks a particular type of business intelligence enjoyed by other industries, namely the extraction of knowledge from online articles. Despite many studies on this subject, no relevant study exists for the forestry industry due to the lack of a usable dataset. This thesis proposes an event detection algorithm for online articles that can be applied to both general business news and forest industry news. To that end, three research questions are examined. Firstly, the creation of a robust dataset that is inclusive of forest industry news. Secondly, establishing the feasibility of building an event detection algorithm to recognize and classify both general business and forest industry news. Lastly, proposing an optimally performing model for the said algorithm. To build an event detection algorithm, machine learning methods, particularly natural language processing, are used. The proposed solution comprises contextualized word embeddings and a classification model. Those word embeddings are created with BERT, a state-of-the-art model for text handling from Google. For model performance tuning, one approach is implemented to address the class imbalance problem. The evaluation shows that the proposed solution delivers a strong result, which indicates promising practical implementations in the forest industry. Companies in the industry should be potentially able to enjoy an aspect of business intelligence that has been employed in other industries. This thesis is the first to empirically examine the links between online news articles, events detection, and the forest industry. The thesis’s contributions are twofold. First, the thesis provides an annotated dataset for use with different machine learning methods. Secondly, it complements literature on the feasibility of an event detection algorithm applicable to both business and forestry industry news.
Thesis advisorMalo, Pekka
forest industry news, business news, natural language processing, machine learning, classification, event detection