Probabilistic Modelling of Multiresolution Biological Data
School of Science | Doctoral thesis (article-based) | Defence date: 2014-11-21
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
84 + app. 84
Aalto University publication series DOCTORAL DISSERTATIONS, 157/2014
AbstractWhen the measurements from the ever improving measurement technology are accumulated over a period of time, the result is the collection of data in different representations. However, most machine learning and data mining algorithms, in their standard form, are designed to operate on data in single representation. This thesis proposes machine learning and data mining algorithms to analyze data in different representation with respect to the resolution within a single analysis. The novel algorithms proposed to analyze multiresolution data are in the field of probabilistic modelling and semantic data mining. First, three different deterministic data transformation methods are proposed to transform data across different resolutions. After the data transformation, the resulting data in same resolution are integrated and modeled using mixture models. Second, similar mixture components in a mixture model are merged one by one repetitively to generate a chain of mixture models. A new fast approximation of the KL-divergence is derived to determine the similarity of the mixture components. The chain of generated mixture models are useful for comparison, for example, in model selection. Third, mixture components in different resolutions are iteratively merged to model multiresolution data generating models in each modeled resolution that incorporate information from data in other resolution. Fourth, a single multiresolution mixture model with multiresolution mixture components is proposed whose mixture components independently have the capabilities of a Bayesian network. Finally, three--part methodology consisting of clustering using mixture models, rule learning using semantic subgroup discovery, and pattern visualization using banded matrices is developed for comprehensive analysis of multiresolution data. The multiresolution data analysis methods presented in this thesis improves the performance of the methods in comparison with the their single resolution counterparts. Furthermore, developed methods aims to make the results understandable to the domain experts. Therefore, the developed methods are useful addition in the analysis of chromosomal aberration patterns and the cancer research in general.
Supervising professorKaski, Samuel, Prof., Aalto University, Department of Information and Computer Science, Finland
Thesis advisorHollmén, Jaakko, D.Sc. (Tech.), Aalto University, Department of Information and Computer Science, Finland
mixture models, chromosomal aberrations, 0-1 data, model selection
[Publication 1]: Prem Raj Adhikari, Jaakko Hollmén. Patterns from Multiresolution 0–1 data. In Jilles Vreeken, Nikolaj Tatti, and Bart Goethals, Editors, UP ’10, ACM SIGKDD Workshop on Useful Patterns, Washington DC, ACM, New York, NY, USA, Pages 8–16, July 25, 2010,
DOI: 10.1145/1816112.1816115, July 2010. View at publisher
[Publication 2]: Prem Raj Adhikari, Jaakko Hollmén. Fast Progressive Training of Mixture Models for Model Selection. Journal of Intelligent Information Systems, IN PRESS, Springer,
DOI: 10.1007/s10844-013-0282-3, Published Online: December 2013. View at publisher
- [Publication 3]: Prem Raj Adhikari, Jaakko Hollmén. Multiresolution Mixture Modeling using Merging of Mixture Components. In Proceedings of Fourth Asian Conference on Machine Learning (ACML 2012), In Steven C.H. Hoi and Wray Buntine Editors, Volume 25 of Journal of Machine Learning Research—Proceedings Track, pages 17–32, November 4–6, 2012, Singapore, URL: http://jmlr.csail.mit.edu/proceedings/papers/v25/adhikari12.html, November 2012.
[Publication 4]: Prem Raj Adhikari, Jaakko Hollmén. Mixture Models from Multiresolution 0–1 Data. In Proceedings of Sixteenth International Conference on Discovery Science (DS 2013), Johannes Fürnkranz, Eyke Hüllermeier, and Tomoyuki Higuchi, Editors, Volume 8140 of Lecture Notes in Computer Science, Springer–Verlag, Berlin Heidelberg, pages 1–16, October 6–9, 2013, Singapore.
DOI: 10.1007/978-3-642-40897-7_1, October 2013. View at publisher
[Publication 5]: Prem Raj Adhikari, Anže Vavpetič, Jan Kralj, Nada Lavrač, Jaakko Hollmén. Explaining mixture models through semantic pattern mining and banded matrix visualization. In Proceedings of Seventeenth International Conference on Discovery Science (DS 2014), Sašo Džeroski, Panče Panov, Dragi Kocev, Ljupčo Todorovski, Editors, Volume 8777 of Lecture Notes in Computer Science, Springer International Publishing Switzerland 2014, pages 1-12, October 8–10, 2014, Bled, Slovenia.
DOI: 10.1007/978-3-319-11812-3_1, October 2014. View at publisher