Sentiment Analysis of Twitter Data for a Tourism Recommender System in Bangladesh

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu | Master's thesis
Cloud Computing and Services
Degree programme
Master's Programme in ICT Innovation
The exponentially expanding Digital Universe is generating huge amount of data containing valuable information. The tourism industry, which is one of the fastest growing economic sectors, can benefit from the myriad of digital data travelers generate in every phase of their travel- planning, booking, traveling, feedback etc. One application of tourism related data can be to provide personalized destination recommendations. The primary objective of this research is to facilitate the business development of a tourism recommendation system for Bangladesh called “JatraLog”. Sentiment based recommendation is one of the features that will be employed in the recommendation system. This thesis aims to address two research goals: firstly, to study Sentiment Analysis as a tourism recommendation tool and secondly, to investigate twitter as a potential source of valuable tourism related data for providing recommendations for different countries, specifically Bangladesh. Sentiment Analysis can be defined as a Text Classification problem, where a document or text is classified into two groups: positive or negative, and in some cases a third group, i.e. neutral. For this thesis, two sets of tourism related English language tweets were collected from Twitter using keywords. The first set contains only the tweets and the second set contains geo-location and timestamp along with the tweets. Then the collected tweets were automatically labeled as positive or negative depending on whether the tweets contained positive or negative emoticons respectively. After they were labeled, 90% of the tweets from the first set were used to train a Naive Bayes Sentiment Classifier and the remaining 10% were used to test the accuracy of the Classifier. The Classifier accuracy was found to be approximately 86.5%. The second set was used to retrieve statistical information required to address the second research goal, i.e. investigating Twitter as a potential source of sentiment data for a destination recommendation system.
Heljanko, Keijo
Thesis advisor
Heljanko, Keijo
big data, sentiment analysis, Twitter, tourism, scala, spark
Other note