Missing Values Estimation: The Pyhäjärvi Case. In application to long-term time series prediction

No Thumbnail Available

URL

Journal Title

Journal ISSN

Volume Title

School of Science | Master's thesis
Checking the digitized thesis and permission for publishing
Instructions for the author

Date

2013

Major/Subject

Informaatiotekniikka

Mcode

T-115

Degree programme

Language

en

Pages

79

Series

Abstract

Environmental modelling and prediction have been within the scope of human interests since ancient times. Contemporary agriculture and food production despite of all technological advances depend largely on favourable ecological conditions. However, climate change and consequences of human activity may deteriorate biological systems we used to utilize and enjoy. One example is Lake Pyhäjärvi. It is a large lake on the south-west of Finland which plays an important role in local agriculture and fishing industry. The lake suffers from eutrophication. It is a process of abundant growth of lake plants and death of animals due to the lack of oxygen. The cause is redundant load of nutrients, especially phosphorus, into the lake from nearby agricultural fields. Due to support of local people and businesses, Pyhäjärvi Institute which develops measures to preserve lake's ecology has been established. This thesis is written in collaboration with researchers from Pyhäjärvi Institute and it is devoted to modelling of phosphorus concentration in the springs of Pyhäjärvi. Phosphorus modelling and prediction help to plan preservation measures and better understand ecology of the lake. The thesis consists of two parts. In the first part, time series prediction problem is addressed. It is natural to model phosphorus concentration as a time series. However, the problem is studied generally and results can be applied to time series from any domain. It is shown that combination of Optimally-Pruned Extreme Learning Machine and DirRec prediction strategy outperforms widely used in practice linear model. Ensemble methods can further improve the accuracy, sometimes significantly. In the second part, practical work with Pyhäjärvi dataset is conducted. It is impossible to directly apply methods of time series prediction, because the data contains many missing values. Therefore, in the beginning it is required to fill them. Several methods to estimate missing values of phosphorus are studied in this part. Regression approach, missing values approach and their combination are evaluated. The best model combinations as well as best variables are selected and imputation is done for three locations.

Description

Supervisor

Simula, Olli

Thesis advisor

Lendasse, Amaury
Ventelä, Anne-Mari

Keywords

Pyhäjärvi, long-term, time series, prediction, missing values, imputation, regression, direct, DirRec, recursive

Other note

Citation