A text-based approach to industry classification

 |  Login

Show simple item record

dc.contributor Aalto University en
dc.contributor Aalto-yliopisto fi
dc.contributor.advisor Malo, Pekka
dc.contributor.advisor Vilkkumaa, Eeva
dc.contributor.author Kee, Taeyoung
dc.date.accessioned 2019-01-18T17:00:14Z
dc.date.available 2019-01-18T17:00:14Z
dc.date.issued 2018
dc.identifier.uri https://aaltodoc.aalto.fi/handle/123456789/36125
dc.description.abstract Industry classification schemes are a critical topic in academic research due to their use in combining companies into smaller groups that share similar characteristics. Although many studies in the domains of economics, accounting and finance depend heavily on these schemes, existing ones have significant limitations mainly due to their stagnant nature, which makes the schemes incapable of adapting to constant innovation and technological development. The objective of this thesis is to propose an automated, text-based industry classification scheme that can reflect constant changes in industry scope. This thesis approaches the research problem by answering two research questions. First, it studies whether it is possible to build an industry classification scheme by using word-embedding vectors extracted from news article. Second, this thesis identifies the benefits of a text-based industry classification scheme in comparison with existing classification schemes. To identify benefits, both qualitative and quantitative assessments are conducted to measure the performance. In the construction of an industry classification scheme, word-embedding vectors generated from news articles are used. The vectors are built using the Word2Vec algorithm. Word2Vec is a recently developed text-mining tool and is excellent at capturing the relationships between words and expressing them in a quantifiable format. The key findings of this thesis are twofold. First, it is technically possible to build an automated, text-based industry classification scheme by using word-embedding vectors. Two methods of building the scheme are proposed. Second, the proposed text-based scheme performs well in classifying companies into relevant business categories. In addition, the cluster-based scheme exhibits better performance in grouping companies into financially homogenous groups when parameters are optimized. The results suggest that a text-based industry classification scheme can serve as an alternative to existing industry classification schemes if parameters are optimized to the purpose of its use. The usefulness of the scheme is expected to increase due to the accelerating speed of innovation and technological development. en
dc.format.extent 52 + 2
dc.format.mimetype application/pdf en
dc.language.iso en en
dc.title A text-based approach to industry classification en
dc.type G2 Pro gradu, diplomityö fi
dc.contributor.school Kauppakorkeakoulu fi
dc.contributor.school School of Business en
dc.contributor.department Tieto- ja palvelujohtamisen laitos fi
dc.subject.keyword industry classification en
dc.subject.keyword cluster analysis en
dc.subject.keyword text mining en
dc.subject.keyword Word2Vec en
dc.identifier.urn URN:NBN:fi:aalto-201901181303
dc.type.ontasot Master's thesis en
dc.type.ontasot Maisterin opinnäyte fi
dc.programme Information and Service Management (ISM) en
dc.subject.helecon tietotalous fi
dc.subject.helecon yritykset fi
dc.subject.helecon toimialat fi
dc.subject.helecon luokitus fi
dc.subject.helecon automaatio fi
dc.ethesisid 17309
dc.location P1 I fi
local.aalto.electroniconly yes
local.aalto.openaccess yes


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search archive


Advanced Search

article-iconSubmit a publication

Browse

My Account