Missing Values Estimation: The Pyhäjärvi Case. In application to long-term time series prediction
No Thumbnail Available
URL
Journal Title
Journal ISSN
Volume Title
School of Science |
Master's thesis
Checking the digitized thesis and permission for publishing
Instructions for the author
Instructions for the author
Authors
Date
2013
Department
Major/Subject
Informaatiotekniikka
Mcode
T-115
Degree programme
Language
en
Pages
79
Series
Abstract
Environmental modelling and prediction have been within the scope of human interests since ancient times. Contemporary agriculture and food production despite of all technological advances depend largely on favourable ecological conditions. However, climate change and consequences of human activity may deteriorate biological systems we used to utilize and enjoy. One example is Lake Pyhäjärvi. It is a large lake on the south-west of Finland which plays an important role in local agriculture and fishing industry. The lake suffers from eutrophication. It is a process of abundant growth of lake plants and death of animals due to the lack of oxygen. The cause is redundant load of nutrients, especially phosphorus, into the lake from nearby agricultural fields. Due to support of local people and businesses, Pyhäjärvi Institute which develops measures to preserve lake's ecology has been established. This thesis is written in collaboration with researchers from Pyhäjärvi Institute and it is devoted to modelling of phosphorus concentration in the springs of Pyhäjärvi. Phosphorus modelling and prediction help to plan preservation measures and better understand ecology of the lake. The thesis consists of two parts. In the first part, time series prediction problem is addressed. It is natural to model phosphorus concentration as a time series. However, the problem is studied generally and results can be applied to time series from any domain. It is shown that combination of Optimally-Pruned Extreme Learning Machine and DirRec prediction strategy outperforms widely used in practice linear model. Ensemble methods can further improve the accuracy, sometimes significantly. In the second part, practical work with Pyhäjärvi dataset is conducted. It is impossible to directly apply methods of time series prediction, because the data contains many missing values. Therefore, in the beginning it is required to fill them. Several methods to estimate missing values of phosphorus are studied in this part. Regression approach, missing values approach and their combination are evaluated. The best model combinations as well as best variables are selected and imputation is done for three locations.Description
Supervisor
Simula, OlliThesis advisor
Lendasse, AmauryVentelä, Anne-Mari
Keywords
Pyhäjärvi, long-term, time series, prediction, missing values, imputation, regression, direct, DirRec, recursive