Analysis of differences between metabolic time series with hidden Markov models

No Thumbnail Available

URL

Journal Title

Journal ISSN

Volume Title

Helsinki University of Technology | Diplomityö
Checking the digitized thesis and permission for publishing
Instructions for the author

Date

2007

Major/Subject

Informaatiotekniikka

Mcode

T-61

Degree programme

Language

en

Pages

71

Series

Abstract

In the thesis the method for finding and analyzing differences between sparse metabolic time series was developed. In metabolic time series the measurements contain concentrations of chemical compounds produced in reactions in a living organism. Analyzing sparse metabolic time series is an important task in medicine and biology, because the metabolome contains a lot of information about the organism, for example about diseases or pathologies, but at the same time it is usually difficult and expensive to make frequent measurements. The most important characteristics of the data used in the study are that time series are relatively short and sparse (that is time interval between subsequential observations is considerably longer than duration of the most biochemical reactions in an organism), measurements are confounded with heavy noise, and the number of time series available is considerably smaller than the dimension of the measurements. The developed approach was primarily designed for metabolomic data, but it can also be applied to the time series with the similar characteristics in other fields. The developed approach contains four stages: preprocessing, designing statistical model, finding differences and analyzing their statistical significance. Hidden Markov Models (HMM) are employed to find differences between metabolic time series. HMM is a statistical method where the modeled system is assumed to be a Markov chain with unknown ("hidden") states emitting visible observations. The properties of the underlying process can be analyzed based on the characteristics of the hidden states and their interrelationships. The developed method was succesfully applied to find and analyze differences between metabolic time series of males and females in growing age extracted from blood plasma. Several time-dependent between-gender differences were identified. Justified suggestions about where these differences come from and about their general structure were made. Compared to methods that ignore the time series structure, HMM-based approach gives superior results and provides some completely new insights to between-gender differences, for instance progression of the development can be investigated. HMMs also combine several advantages compared to other time series modelling methods: they are computationally relatively light, are able to produce relatively good results with the moderate amount of data and can be applied to sparse time series. It is relatively easy to extend and generalize the developed method.

Description

Supervisor

Kaski, Samuel

Thesis advisor

Nikkilä, Janne

Keywords

hidden Markov models, time series, metabolomics

Other note

Citation