Probabilistic Modelling of Multiresolution Biological Data

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.advisorHollmén, Jaakko, D.Sc. (Tech.), Aalto University, Department of Information and Computer Science, Finland
dc.contributor.authorAdhikari, Prem Raj
dc.contributor.departmentTietojenkäsittelytieteen laitosfi
dc.contributor.departmentDepartment of Information and Computer Scienceen
dc.contributor.labParsimonious Modelling Research Groupen
dc.contributor.labPelkistetty mallintaminenfi
dc.contributor.schoolPerustieteiden korkeakoulufi
dc.contributor.schoolSchool of Scienceen
dc.contributor.supervisorKaski, Samuel, Prof., Aalto University, Department of Information and Computer Science, Finland
dc.date.accessioned2014-10-25T09:00:14Z
dc.date.available2014-10-25T09:00:14Z
dc.date.dateaccepted2014-08-14
dc.date.defence2014-11-21
dc.date.issued2014
dc.description.abstractWhen the measurements from the ever improving measurement technology are accumulated over a period of time, the result is the collection of data in different representations. However, most machine learning and data mining algorithms, in their standard form, are designed to operate on data in single representation. This thesis proposes machine learning and data mining algorithms to analyze data in different representation with respect to the resolution within a single analysis. The novel algorithms proposed to analyze multiresolution data are in the field of probabilistic modelling and semantic data mining. First, three different deterministic data transformation methods are proposed to transform data across different resolutions. After the data transformation, the resulting data in same resolution are integrated and modeled using mixture models. Second, similar mixture components in a mixture model are merged one by one repetitively to generate a chain of mixture models. A new fast approximation of the KL-divergence is derived to determine the similarity of the mixture components. The chain of generated mixture models are useful for comparison, for example, in model selection. Third, mixture components in different resolutions are iteratively merged to model multiresolution data generating models in each modeled resolution that incorporate information from data in other resolution. Fourth, a single multiresolution mixture model with multiresolution mixture components is proposed whose mixture components independently have the capabilities of a Bayesian network. Finally, three--part methodology consisting of clustering using mixture models, rule learning using semantic subgroup discovery, and pattern visualization using banded matrices is developed for comprehensive analysis of multiresolution data. The multiresolution data analysis methods presented in this thesis improves the performance of the methods in comparison with the their single resolution counterparts. Furthermore, developed methods aims to make the results understandable to the domain experts. Therefore, the developed methods are useful addition in the analysis of chromosomal aberration patterns and the cancer research in general.en
dc.format.extent84 + app. 84
dc.format.mimetypeapplication/pdfen
dc.identifier.isbn978-952-60-5904-4 (electronic)
dc.identifier.isbn978-952-60-5903-7 (printed)
dc.identifier.issn1799-4942 (electronic)
dc.identifier.issn1799-4934 (printed)
dc.identifier.issn1799-4934 (ISSN-L)
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/14302
dc.identifier.urnURN:ISBN:978-952-60-5904-4
dc.language.isoenen
dc.opnde Ridder, Jeroen, Asst. Prof., Delft University of Technology, Delft, The Netherlands
dc.publisherAalto Universityen
dc.publisherAalto-yliopistofi
dc.relation.haspart[Publication 1]: Prem Raj Adhikari, Jaakko Hollmén. Patterns from Multiresolution 0–1 data. In Jilles Vreeken, Nikolaj Tatti, and Bart Goethals, Editors, UP ’10, ACM SIGKDD Workshop on Useful Patterns, Washington DC, ACM, New York, NY, USA, Pages 8–16, July 25, 2010, DOI: 10.1145/1816112.1816115, July 2010.
dc.relation.haspart[Publication 2]: Prem Raj Adhikari, Jaakko Hollmén. Fast Progressive Training of Mixture Models for Model Selection. Journal of Intelligent Information Systems, IN PRESS, Springer, DOI: 10.1007/s10844-013-0282-3, Published Online: December 2013.
dc.relation.haspart[Publication 3]: Prem Raj Adhikari, Jaakko Hollmén. Multiresolution Mixture Modeling using Merging of Mixture Components. In Proceedings of Fourth Asian Conference on Machine Learning (ACML 2012), In Steven C.H. Hoi and Wray Buntine Editors, Volume 25 of Journal of Machine Learning Research—Proceedings Track, pages 17–32, November 4–6, 2012, Singapore, URL: http://jmlr.csail.mit.edu/proceedings/papers/v25/adhikari12.html, November 2012.
dc.relation.haspart[Publication 4]: Prem Raj Adhikari, Jaakko Hollmén. Mixture Models from Multiresolution 0–1 Data. In Proceedings of Sixteenth International Conference on Discovery Science (DS 2013), Johannes Fürnkranz, Eyke Hüllermeier, and Tomoyuki Higuchi, Editors, Volume 8140 of Lecture Notes in Computer Science, Springer–Verlag, Berlin Heidelberg, pages 1–16, October 6–9, 2013, Singapore. DOI: 10.1007/978-3-642-40897-7_1, October 2013.
dc.relation.haspart[Publication 5]: Prem Raj Adhikari, Anže Vavpetič, Jan Kralj, Nada Lavrač, Jaakko Hollmén. Explaining mixture models through semantic pattern mining and banded matrix visualization. In Proceedings of Seventeenth International Conference on Discovery Science (DS 2014), Sašo Džeroski, Panče Panov, Dragi Kocev, Ljupčo Todorovski, Editors, Volume 8777 of Lecture Notes in Computer Science, Springer International Publishing Switzerland 2014, pages 1-12, October 8–10, 2014, Bled, Slovenia. DOI: 10.1007/978-3-319-11812-3_1, October 2014.
dc.relation.ispartofseriesAalto University publication series DOCTORAL DISSERTATIONSen
dc.relation.ispartofseries157/2014
dc.revBohanec, Marko, Prof. Dr., Jožef Stefan Institute, Ljubljana, Slovenia
dc.revYli–Harja, Olli, Prof., Tampere University of Technology, Tampere, Finland
dc.subject.keywordmixture modelsen
dc.subject.keywordchromosomal aberrationsen
dc.subject.keyword0-1 dataen
dc.subject.keywordmodel selectionen
dc.subject.otherComputer scienceen
dc.titleProbabilistic Modelling of Multiresolution Biological Dataen
dc.typeG5 Artikkeliväitöskirjafi
dc.type.dcmitypetexten
dc.type.ontasotDoctoral dissertation (article-based)en
dc.type.ontasotVäitöskirja (artikkeli)fi
local.aalto.digiauthask
local.aalto.digifolderAalto_64707

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
isbn9789526059044.pdf
Size:
1.36 MB
Format:
Adobe Portable Document Format