Knowledge Management: Document Similarity Based Recommendation

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu | Master's thesis
Date
2021-01-25
Department
Major/Subject
Data Science
Mcode
SCI3095
Degree programme
Master's Programme in ICT Innovation
Language
en
Pages
104 + 7
Series
Abstract
Futurice works on developing and designing digital services and products. Its an innovative organization which invests its expertise and experience in promoting and creating a knowledge management system powered by data and artificial intelligence to make the internal process of the organization robust and dynamic. Within this vision, a project called Exponential AI focuses on connecting internal knowledge and developing a platform for querying related and similar materials. This thesis is a part of the same project. The study was based on recommendation systems which can create a new exploratory viewpoint to investigate related, unexplored and untapped materials (documents). Interviews were conducted with various sales representatives regarding the challenges and barriers faced during the current process. It was found that the most critical need was to develop a searching mechanism for proposal creation, since it was hard to access and recycle similar material within the organization. Therefore, a literature study was conducted on different approaches that have been used in the past to structure knowledge management and apply document similarity methods to textual data. Several methods were tested including K-nearest neighbors, association ruling, clustering and word embeddings (Word2Vec, Doc2Vec) to measure text similarity. In this thesis, a recommendation system was developed. It was based on document-based similarity between proposals. The features for the model were extracted from documents and they support matching of these documents on the basis content, context, and industry domain. From the comparative analysis of different methods, we found that word embedding models performed better, in general, and in particular Word2Vec and Doc2Vec models outperformed the other techniques. Finally, the results from both methods were combined and ranked according to the date and score of similarity of the document to provide qualitative recommendations. A set of sales representatives were asked to test and provide feedback on the usefulness and results of the recommendations. According to the responses, results are promising, and the recommendation system has potential to resolve challenges like material recycling, exploration, and time management in the current sales process.
Description
Supervisor
Laaksonen, Jorma
Thesis advisor
Ajanki, Antti
Väänänen , Riikka
Keywords
document similarity, data mining, natural language processing, text analysis, document clustering, user research
Other note
Citation
Collections