Learning Methods for Variable Selection and Time Series Prediction

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
School of Science | Doctoral thesis (article-based) | Defence date: 2014-10-31
Checking the digitized thesis and permission for publishing
Instructions for the author
Degree programme
114 + app. 108
Aalto University publication series DOCTORAL DISSERTATIONS, 138/2014
In the recent years, machine learning methods have become increasingly popular for modelling many different phenomena: financial markets, spatio-temporal data sets, pattern recognition, speech and image processing, recommender systems and many others. This huge interest in machine learning comes from the great success of their application and the increasingly easier acquisition, storage and access of data. In this thesis, two general problems in machine learning are discussed and several solutions are offered. The first problem is variable selection, an approach to automatically select the most relevant features in the data. Two key phases of variable selection are the search criterion and the search algorithm. The thesis focuses on the Delta test as a search criterion, while several solutions are offered for the search algorithm, such as the Genetic Algorithm and Tabu Search. Furthermore, the selection procedure is extended for more general cases of scaling and projection, as well as their combination. Finally, some of the above proposed solutions have been developed for parallel architectures which enable the whole variable selection procedure to be used for data sets with a high number of features. The second problem tackled in the thesis is time series prediction that arises in many fields of science and industry. In simple words: time series prediction involves the estimation of future values for a series of measurements of a/the phenomenon of interest. The number of these estimations can be small, leading to short-term prediction, or several hundreds which constitute long-term prediction. Two models have been developed for this particular task. One is based on a recently popular neural network type called Extreme Learning Machine, while the other is a juxtaposition of Generative Topographic Mapping and Relevance Learning modified for regression tasks. Finally, the above problems are tackled together for real-world time series coming from a biological domain. The difficulty of making any kind of inference in biological time series is due to really small amount of available samples, irregular sampling frequency and spatial coverage of areas of interest. Nevertheless, more stable model parameter estimation is possible with the combined use of global climate indicators and regional measurements in the form of a multifactor approach.
Supervising professor
Karhunen, Juha, Prof., Aalto University, Department of Information and Computer Science, Finland
Thesis advisor
Lendasse, Amaury, Dr., Aalto University, Department of Information and Computer Science, Finland
Pouzols, Federico Montesino, Dr., University of Helsinki, Finland
variable selection/scaling/projection, time series prediction, environmental modelling, model structure selection
Other note
  • [Publication 1]: Dušan Sovilj, Antti Sorjamaa, Qi Yu, Yoan Miche, Eric Séverin. OPELM and OP-KNN in Long-Term Prediction of Time Series using Projected Input Data. Neurocomputing, 73(10–12):1976–1986, June 2010.
    DOI: 10.1016/j.neucom.2009.11.033 View at publisher
  • [Publication 2]: Fernando Mateo, Dušan Sovilj, Rafael Gadea. Approximate k-NN Delta Test Minimization Method using Genetic Algorithms: Application to Time Series. Neurocomputing, 73(10–12):2017–2029, June 2010.
    DOI: 10.1016/j.neucom.2009.11.032 View at publisher
  • [Publication 3]: Karin Junker, Dušan Sovilj, Ingrid Kröncke, Joachim Dippner. Climate induced changes in benthic macrofauna – A non-linear model approach. Journal of Marine Systems, 96–97:90–94, August 2012.
    DOI: 10.1016/j.jmarsys.2012.02.005 View at publisher
  • [Publication 4]: Dušan Sovilj. Multistart Strategy Using Delta Test for Variable Selection. In International Conference on Artificial Neural Networks (ICANN 2011, Part II), pages 413–420, Lecture Notes in Computer Science volume 6792. Espoo, Finland, June 2011.
    DOI: 10.1007/978-3-642-21738-8_53 View at publisher
  • [Publication 5]: Andrej Gisbrecht, Dušan Sovilj, Barbara Hammer, and Amaury Lendasse. Relevance learning for time series inspection. In European Symposium on Artificial Neural Networks (ESANN 2012), pages 489–494, Computational Intelligence and Machine Learning. Bruges, Belgium, April 2012.
  • [Publication 6]: Dušan Sovilj, Amaury Lendasse, Olli Simula. Extending Extreme 5 Learning Machine with Combination Layer. In International Work-Conference on Artificial Neural Networks, pages 417—426, Lecture Notes in Computer Science volume 7902. Tenerife, Spain, June 2013.
  • [Publication 7]: Alberto Guillén, Mark van Heeswijk, Dušan Sovilj, M. G. Arenas, Héctor Pomares, and Ignacio Rojas. Variable Selection in a GPU Cluster using Delta Test. In International Work-Conference on Artificial Neural Networks, pages 393–400, Lecture Notes in Computer Science volume 6691. Málaga, Spain, June 2011.
    DOI: 10.1007/978-3-642-21501-8_49 View at publisher
  • [Publication 8]: Alberto Guillén, Dušan Sovilj, Mark van Heeswijk, Luis Javier Herrera, Amaury Lendasse, Héctor Pomares, and Ignacio Rojas. Evolutive Approaches for Variable Selection Using a Non-parametric Noise Estimator. Parallel Architectures & Bioinspired Algorithms, Studies in Computational Intelligence volume 415, pages 243–266, August 2012.
    DOI: 10.1007/978-3-642-28789-3_11 View at publisher