Probabilistic analysis of the human transcriptome with side information

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
Aalto-yliopiston teknillinen korkeakoulu | Doctoral thesis (article-based)
Checking the digitized thesis and permission for publishing
Instructions for the author
Degree programme
Verkkokirja (1673 KB, 92 s.)
TKK dissertations in information science and technology, 19
Recent advances in high-throughput measurement technologies and efficient sharing of biomedical data through community databases have made it possible to investigate the complete collection of genetic material, the genome, which encodes the heritable genetic program of an organism. This has opened up new views to the study of living organisms with a profound impact on biological research. Functional genomics is a subdiscipline of molecular biology that investigates the functional organization of genetic information. This thesis develops computational strategies to investigate a key functional layer of the genome, the transcriptome. The time- and context-specific transcriptional activity of the genes regulates the function of living cells through protein synthesis. Efficient computational techniques are needed in order to extract useful information from high-dimensional genomic observations that are associated with high levels of complex variation. Statistical learning and probabilistic models provide the theoretical framework for combining statistical evidence across multiple observations and the wealth of background information in genomic data repositories. This thesis addresses three key challenges in transcriptome analysis. First, new preprocessing techniques that utilize side information in genomic sequence databases and microarray collections are developed to improve the accuracy of high-throughput microarray measurements. Second, a novel exploratory approach is proposed in order to construct a global view of cell-biological network activation patterns and functional relatedness between tissues across normal human body. Information in genomic interaction databases is used to derive constraints that help to focus the modeling in those parts of the data that are supported by known or potential interactions between the genes, and to scale up the analysis. The third contribution is to develop novel approaches to model dependency between co-occurring measurement sources. The methods are used to study cancer mechanisms and transcriptome evolution; integrative analysis of the human transcriptome and other layers of genomic information allows the identification of functional mechanisms and interactions that could not be detected based on the individual measurement sources. Open source implementations of the key methodological contributions have been released to facilitate their further adoption by the research community.
Supervising professor
Kaski, Samuel, Prof.
data integration, exploratory data analysis, functional genomics, probabilistic modeling, transcriptomics
Other note
  • [Publication 1]: Laura L. Elo, Leo Lahti, Heli Skottman, Minna Kyläniemi, Riitta Lahesmaa, and Tero Aittokallio. 2005. Integrating probe-level expression changes across generations of Affymetrix arrays. Nucleic Acids Research, volume 33, number 22, e193, 10 pages. © 2005 by authors.
  • [Publication 2]: Leo Lahti, Laura L. Elo, Tero Aittokallio, and Samuel Kaski. 2011. Probabilistic analysis of probe reliability in differential gene expression studies with short oligonucleotide arrays. IEEE/ACM Transactions on Computational Biology and Bioinformatics, volume 8, number 1, pages 217-225. © 2011 Institute of Electrical and Electronics Engineers (IEEE). By permission.
  • [Publication 3]: Leo Lahti, Juha E. A. Knuuttila, and Samuel Kaski. 2010. Global modeling of transcriptional responses in interaction networks. Bioinformatics, volume 26, number 21, pages 2713-2720. © 2010 by authors.
  • [Publication 4]: Leo Lahti, Samuel Myllykangas, Sakari Knuutila, and Samuel Kaski. 2009. Dependency detection with similarity constraints. In: Tülay Adali, Jocelyn Chanussot, Christian Jutten, and Jan Larsen (editors). Proceedings of the 19th IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2009). Grenoble, France. 1-4 September 2009. Piscataway, NJ, USA. IEEE. Pages 89-94. ISBN 978-1-4244-4947-7. © 2009 Institute of Electrical and Electronics Engineers (IEEE). By permission.
  • [Publication 5]: Janne Sinkkonen, Janne Nikkilä, Leo Lahti, and Samuel Kaski. 2004. Associative clustering. In: Jean-François Boulicaut, Floriana Esposito, Fosca Giannotti, and Dino Pedreschi (editors). Proceedings of the 15th European Conference on Machine Learning (ECML 2004). Pisa, Italy. 20-24 September 2004. Berlin, Heidelberg, Germany. Springer. Lecture Notes in Computer Science, volume 3201, pages 396-406. ISBN 3-540-23105-6. © 2004 by authors and © 2004 Springer Science+Business Media. By permission.
  • [Publication 6]: Samuel Kaski, Janne Nikkilä, Janne Sinkkonen, Leo Lahti, Juha E. A. Knuuttila, and Christophe Roos. 2005. Associative clustering for exploring dependencies between functional genomics data sets. IEEE/ACM Transactions on Computational Biology and Bioinformatics: Special Issue on Machine Learning for Bioinformatics - Part 2, volume 2, number 3, pages 203-216. © 2005 Institute of Electrical and Electronics Engineers (IEEE). By permission.