Comparing Talks, Realities and Concerns on Climate Change: An analysis of Textual, Numerical and Categorical Data

Master's thesis
Several efforts are being made by the international community to mitigate the effects of climate change, with UNFCCC being one of the largest international bodies for addressing this issue. The 2010-2011 conference of UNFCCC took place in Cancun Mexico, where the representatives of different countries showed their concerns in their talks. In this work, we conducted a text mining analysis of the text of the talks using the Self-Organizing Map (SOM) algorithm and some statistical techniques for the data preparation. We also present a SOM-based analysis of the textual data when combined with certain statistical information regarding the 143 countries and international organizations that participated to the conference. The data preparation process included the use of OCR, machine translation and approximate string matching. In the analysis, we assumed that the collection of the terms serves as a relevant set of features that reject the content of the talks. The aims of the analysis were two-fold: first we investigated if it is possible to find content patterns that are similar in the talks based on the text itself. In the second phase of our analysis, we associated some contextual information with the text-based information in order to see what is the relationship between the talks and the realities in each of the countries. The contextual information comprises of various numerical and categorical variables that reflect concrete and real situations regarding the countries. The basic hypothesis for our work is that there is a detectable but complex relationship between the content of the talks and known facts such as GDP, literacy rate, death rate, and commitment to various international treaties such as Kyoto protocol.
Honkela, Timo
text mining, data mining, SOM, UNFCCC, climate change