A text-based approach to industry classification
Loading...
URL
Journal Title
Journal ISSN
Volume Title
School of Business |
Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
2018
Department
Major/Subject
Mcode
Degree programme
Information and Service Management (ISM)
Language
en
Pages
52 + 2
Series
Abstract
Industry classification schemes are a critical topic in academic research due to their use in combining companies into smaller groups that share similar characteristics. Although many studies in the domains of economics, accounting and finance depend heavily on these schemes, existing ones have significant limitations mainly due to their stagnant nature, which makes the schemes incapable of adapting to constant innovation and technological development. The objective of this thesis is to propose an automated, text-based industry classification scheme that can reflect constant changes in industry scope. This thesis approaches the research problem by answering two research questions. First, it studies whether it is possible to build an industry classification scheme by using word-embedding vectors extracted from news article. Second, this thesis identifies the benefits of a text-based industry classification scheme in comparison with existing classification schemes. To identify benefits, both qualitative and quantitative assessments are conducted to measure the performance. In the construction of an industry classification scheme, word-embedding vectors generated from news articles are used. The vectors are built using the Word2Vec algorithm. Word2Vec is a recently developed text-mining tool and is excellent at capturing the relationships between words and expressing them in a quantifiable format. The key findings of this thesis are twofold. First, it is technically possible to build an automated, text-based industry classification scheme by using word-embedding vectors. Two methods of building the scheme are proposed. Second, the proposed text-based scheme performs well in classifying companies into relevant business categories. In addition, the cluster-based scheme exhibits better performance in grouping companies into financially homogenous groups when parameters are optimized. The results suggest that a text-based industry classification scheme can serve as an alternative to existing industry classification schemes if parameters are optimized to the purpose of its use. The usefulness of the scheme is expected to increase due to the accelerating speed of innovation and technological development.Description
Thesis advisor
Malo, PekkaVilkkumaa, Eeva
Keywords
industry classification, cluster analysis, text mining, Word2Vec