Learning Centre

Reducing Sparsity in Sentiment Analysis Data using Novel Dimensionality Reduction Approaches

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en
dc.contributor.advisor Miche, Yoan
dc.contributor.author Sayfullina, Luiza
dc.date.accessioned 2014-11-11T12:03:56Z
dc.date.available 2014-11-11T12:03:56Z
dc.date.issued 2014-11-03
dc.identifier.uri https://aaltodoc.aalto.fi/handle/123456789/14446
dc.description.abstract No aspect of our mental life is more important to the quality and meaning of our existence than emotions and sentiments. Recently researches have introduced many Machine Learning approaches to analyse sentiment from public blogs, social networks, etc. Due to the sparse and high-dimensional textual datasets one needs Feature Selection before applying classifiers. The scope of my thesis are Dimensionality Reduction techniques for predicting one of the two opposite sentiments, specifically for Polarity Classification. The greatest challenge for Text Classification problems in general is data sparsity. Especially it is for Bag-of-words model, where the document is represented by the number of occurrences of each term in the vocabulary. Hence it can be hard for a classifier to understand the relationships between all the words in the initial vocabulary when training set is not large enough. In this thesis I investigate possible steps required to decrease the sparsity: setting the vocabulary, using sentiment dictionaries, choosing data representation and Dimensionality Reduction methods and their underlying strategies. I describe fast and intuitive unsupervised and supervised tf-idf scores for Feature Ranking. In addition, Word Clustering algorithm for merging the words with very close semantical meaning is introduced. By clustering semantically close words we decrease the feature space with minimum loss of information compared to Feature Selection, where we simply omit the features. Polarity Classification problem is investigated on two datasets: SemEval 2013 Twitter Sentiment Analysis and KDD Project Excitement Prediction using Extreme Learning Machine. Best performance for both datasets was achieved by using the proposed Word Clustering and supervised tf-idf score with 20 times less features than original vocabulary size. en
dc.format.extent 71
dc.language.iso en en
dc.title Reducing Sparsity in Sentiment Analysis Data using Novel Dimensionality Reduction Approaches en
dc.type G2 Pro gradu, diplomityö en
dc.contributor.school Perustieteiden korkeakoulu fi
dc.subject.keyword sentiment analysis en
dc.subject.keyword tf-idf en
dc.subject.keyword word clustering en
dc.subject.keyword sparsity en
dc.identifier.urn URN:NBN:fi:aalto-201411123023
dc.programme.major Machine Learning and Data Mining fi
dc.programme.mcode SCI3015 fi
dc.type.ontasot Master's thesis en
dc.type.ontasot Diplomityö fi
dc.contributor.supervisor Karhunen, Juha
dc.programme Master’s Programme in Machine Learning and Data Mining (Macadamia) fi
dc.ethesisid Aalto 2015
dc.location P1
local.aalto.openaccess no
local.aalto.digifolder Aalto_06809
dc.rights.accesslevel closedAccess
local.aalto.idinssi 50049
dc.type.publication masterThesis
dc.type.okm G2 Pro gradu, diplomityö

Files in this item

Files Size Format View

There are no open access files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search archive

Advanced Search

article-iconSubmit a publication