The problem of time series analysis and incomlete data: Real-world applications

No Thumbnail Available

URL

Journal Title

Journal ISSN

Volume Title

School of Science | Master's thesis
Checking the digitized thesis and permission for publishing
Instructions for the author

Date

2013

Major/Subject

Informaatiotekniikka

Mcode

T-115

Degree programme

Language

en

Pages

53

Series

Abstract

One of the characteristics of almost any data collection is the presence of outstanding series and missing values. The risk to get the incomplete and hard processed data increases especially if the data is characterized with a large size or collected manually. The presence of missing values in the data cannot be underestimated. In addition to containing important information, missing values are often correlated with other values. Furthermore, the predicted data allows analysing the data and performing future forecast on obtained results. In case of data analysis, it is essential to study data properties carefully. The data analysis occurs in every sphere, e.g. sociology, finance, environment, science, wherever there are issues to be studied and explored. Social networks have been always a reach topic to explore. Being highly dynamic objects, the issues require a deep and careful investigation. Moreover, due to their properties, like a small number of samples and a high amount of variables at the same time, online data seeks for additional methods to highlight and uncover interesting parts. The proposed methodology of a modified Forward-Backward algorithm aims to analyse social networks presented as time series data sets. All the time, people study deeply burning issues, related to climate and economy. Since these topics are of a particular interest, in the thesis, the imputations of missing values are performed on real-world data sets from climatology and financial areas. The application shows the possible variety and importance of predicting the missing values. There exist a large number of methods which allow imputing missing values. A number of promising algorithms is investigated and compared due to data sets difference -The EOF, the Ensemble of SOMs and the Mixture of Gaussians.

Description

Supervisor

Simula, Olli

Thesis advisor

Lendasse, Amaury

Keywords

time series prediction, social networks, variable selection, forward-backward algorithm, missing values, water temperature data, imputations, ensemble of SOMs, EOF, mixture of gaussians

Other note

Citation