Semantic Content Filtering and Sentiment Analysis for Financial News

No Thumbnail Available
Journal Title
Journal ISSN
Volume Title
School of Business | Doctoral thesis (article-based)
Degree programme
Aalto University publication series DOCTORAL DISSERTATIONS, 221/2016
Today we seldom suffer from lack of information; on the contrary, we often suffer from too much information. As a consequence, important information might go unnoticed, which of course is harmful for individuals, companies, and the economy as a whole. To alleviate the current situation, tools for analyzing financial news are developed in this dissertation. This thesis consists of an introductory part and six research essays. These essays cover three different aspects of these matters. The first two essays cover the data mining and document filtering aspects. In Essay 1, the Wiki-SR method is presented. This approach uses Wikipedia to calculate the relatedness between two concepts, which enhances search queries by implicitly expanding them. This essay also introduces a framework that allows for multiple models in order to improve document modeling. Essay 2 presents a modified Wilks' lambda technique for finding the concepts that best describe a specific document. Even if the proposed approach is light-weight, it is still very efficient. The second group of essays focuses on sentiment analysis. Essay 3 presents an approach that parses sentences and detects any words that might change the polarity of a sentiment-bearing word. This approach shows a significant improvement in accuracy of the analysis. The result was verified with our manually annotated sentiment corpus. A more advanced sentiment corpus was published in Essay 4. This new dual-layer corpus is annotated on both the document and sentence level. As it also allows multiple sentiment-bearing entities in the same sentence, more advanced techniques can be developed. Both corpora are publicly available, and they alleviate the current lack of method evaluation sets in the financial domain. The last two essays put this research in context. Essay 5 studies the research done in the field of sentiment analysis over the last decade. When the keywords given by authors and publishers are compared and the wording of titles and abstracts is analyzed, there are four distinctive areas of interest. Two of them are related to techniques used for sentiment analysis (sentiment classification and sentiment lexicon), and two are common domains of the analysis (reviews and social media). Essay 6 describes the steps needed for a computational approach to financial news analysis as well as commonly used tools and resources.
Supervising professor
Malo, Pekka, Assoc. Prof., Aalto University, Department of Information and Service Economy, Finland
Thesis advisor
Wallenius, Jyrki, Prof., Aalto University, Finland
Korhonen, Pekka, Prof., Aalto University, Finland
data mining, document filtering, text analysis, sentiment detection, sentiment corpora
Other note
  • [Publication 1]: Pekka Malo, Pyry Siitari, Oskar Ahlgren, Jyrki Wallenius, and Pekka Korhonen. Semantic Content Filtering with Wikipedia and Ontologies. Proceedings of the IEEE International Conference on Data Mining Workshops (SADM 2010), December 2010, Sydney, Australia.
  • [Publication 2]: Oskar Ahlgren, Pekka Malo, Ankur Sinha, Jyrki Wallenius, and Pekka Korhonen. A Dimensionality Reduction Approach to Semantic Document Classification. Proceedings of the 2nd Workshop on Semantic Personalized Information Management: Retrieval and Recommendation (SPIM 2011) in conjunction with the 10th International Semantic Web Conference (ISWC 2011), October 2011, Bonn, Germany
  • [Publication 3]: Pekka Malo, Ankur Sinha, Pyry Siitari, Oskar Ahlgren, and Iivari Lappalainen. Learning the Roles of Directional Expressions and Domain Concepts in Financial News Analysis. Proceedings of the IEEE International Conference on Data Mining Workshops (SENTIRE 2013), December 2013, Dallas, U.S.A.
  • [Publication 4]: Pyry Takala, Pekka Malo, Ankur Sinha, and Oskar Ahlgren. Gold-standard for Topic-Specific Sentiments in Economic Texts. Proceedings of the 9th edition of the Language Resources and Evaluation Conference (LREC 2014), May 2014, Reykjavik, Iceland
  • [Publication 5]: Oskar Ahlgren, Research On Sentiment Analysis: The First Decade. Forthcoming.
    DOI: 10.1109/ICDMW.2016.0131 View at publisher
  • [Publication 6]: Oskar Ahlgren, Bikesh Upreti, Pekka Malo, and Ankur Sinha, Knowledge-driven Approaches for Financial News Analysis. Unpublished.