Event Discovery from Social Media Feeds
No Thumbnail Available
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu |
Master's thesis
Author
Date
2021-12-13
Department
Major/Subject
Computer Science
Mcode
SCI3042
Degree programme
Master’s Programme in Computer, Communication and Information Sciences
Language
en
Pages
vii + 56
Series
Abstract
Unexpected events occur from time to time, some might be planned with a pre- defined time and location, e.g., a concert, whereas some might be unplanned and happen suddenly, e.g., a strike. There can be a huge load in telecommunication networks due to such gatherings and the load might cause problems in network accessibility. To predict such events, tweets can be used as an information source. In this thesis, Twint was used to scrape tweets from Twitter. Different techniques were used to annotate the tweet data and a technique called Named Entity Recognition (NER) was used to train a classification model with limited amount of training data. Training a model from scratch can be both time and resource consuming so transfer learning was used with the spaCy library. A number of experiments were performed to find a model that correctly classifies concert as the type of an event, based on the context of the tweet. A key observation of the results presented is that the model trained with manual annotation of data shows better result compared to models trained with data that contains rule-based annotation and combination of rule-based and manual annotation. The analysis in this thesis shows that it is possible to use NER to extract events from social media feeds based on the context of the tweet. The thesis suggests that with domain expertise, the used approach to annotate data and to fine- tune a Transformer-based model can be utilized and extended to multiple NLP problems to create domain-specific applications.Description
Supervisor
Laaksonen, JormaThesis advisor
Sakko, ArtoKeywords
natural language processing, named entity recognition, event extraction, deep learning, transfer learning, tweets