Title: | Machine Learning Methods for Classification of Unstructured Data |
Author(s): | Sayfullina, Luiza |
Date: | 2019 |
Language: | en |
Pages: | 80 + app. 85 |
Department: | Tietotekniikan laitos Department of Computer Science |
ISBN: | 978-952-60-8675-0 (electronic) 978-952-60-8674-3 (printed) |
Series: | Aalto University publication series DOCTORAL DISSERTATIONS, 146/2019 |
ISSN: | 1799-4942 (electronic) 1799-4934 (printed) 1799-4934 (ISSN-L) |
Supervising professor(s): | Kannala, Juho, Prof., Aalto University, Department of Computer Science, Finland; Karhunen Juha, Prof., Aalto University, Department of Computer Science, Finland |
Thesis advisor(s): | Eirola, Emil, Dr., SILO AI, Finland |
Subject: | Computer science |
Keywords: | machine learning, natural language processing, neural networks, android malware, soft skills, job recommender systems, text classification, occupational segregation |
Archive | yes |
|
|
Abstract:Natural language processing is a field that studies automatic computational processing of human languages. Although natural language is symbolic and full of rules and ontologies, the state-of-the-art approaches are typically based on statistical machine learning. With the invention of word embeddings, researchers have managed to circumvent a problem of sparse feature space and to take into account word semantics learned from large corpora. When it comes to artificial strings, e.g. in source code, the usage of embeddings is restricted due to extremely large vocabulary. This dissertation covers two interesting applications using both embedding based and bag-of-words approaches: one related to industrial scale Android malware classification and another to extraction of soft skills and their impact on occupational gender segregation. Data coming from both applications is unstructured since Android applications consist of a set of files belonging to mainly unstructured data or semi-structured data, while job postings used for soft skill analysis represent free text where no clear structure is defined.
|
|
Parts:[Publication 1]: Luiza Sayfullina, Emil Eirola, Dmitry Komashinsky, Paolo Palumbo, Yoan Miche, Amaury Lendasse, Juha Karhunen. Efficient Detection of Zero-day Android Malware Using Normalized Bernoulli Naive Bayes. InInternational Conference on Trust, Security and Privacy in Computing and Communications, 198–205, August 2015. DOI: 10.1109/Trustcom.2015.375 View at Publisher [Publication 2]: Luiza Sayfullina, Emil Eirola, Dmitry Komashinsky, Paolo Palumbo. Android malware detection: Building Useful Representations. In IEEE 15th International Conference on Machine Learning and Applications, 201–206, December 2016. Full text in Acris/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-201612165986. DOI: 10.1109/ICMLA.2016.0041 View at Publisher [Publication 3]: Paolo Palumbo, Luiza Sayfullina, Dmitry Komashinsky, Emil Eirola, Juha Karhunen. Pragmatic Android Malware Detection. Computers and Security, 689–701, July 2017. DOI: 10.1016/j.cose.2017.07.013 View at Publisher [Publication 4]: Luiza Sayfullina, Eric Malmi, YiPing Liao, Alex Jung. Domain Adaptation for Resume Classification Using Convolutional Neural Networks. In The 6th International Conference on Analysis of Images, Social Networks, and Texts, 82–93, December 2017. DOI: 10.1007/978-3-319-73013-4_8 View at Publisher [Publication 5]: Federica Calanca, Luiza Sayfullina, Lara Minkus, Claudia Wagner, Eric Malmi. Responsible team players wanted: An analysis of soft skill requirements in job advertisements. EPJ Data Science, p. 13, April 2019. Full text in Acris/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-201906033403. DOI: 10.1140/epjds/s13688-019-0190-z View at Publisher [Publication 6]: Luiza Sayfullina, Eric Malmi, Juho Kannala. Learning Representations for Soft Skills Matching. In The 7th International Conference on Analysis of Images, Social Networks, and Texts, p. 12, December 2018. |
|
|
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Page content by: Aalto University Learning Centre | Privacy policy of the service | About this site