Modeling of mutual dependencies

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en
dc.contributor.author Klami, Arto
dc.date.accessioned 2012-08-20T12:45:04Z
dc.date.available 2012-08-20T12:45:04Z
dc.date.issued 2008
dc.identifier.isbn 978-951-22-9520-3
dc.identifier.isbn 978-951-22-9519-7 (printed) #8195;
dc.identifier.uri https://aaltodoc.aalto.fi/handle/123456789/4517
dc.description.abstract Data analysis means applying computational models to analyzing large collections of data, such as video signals, text collections, or measurements of gene activities in human cells. Unsupervised or exploratory data analysis refers to a subtask of data analysis, in which the goal is to find novel knowledge based on only the data. A central challenge in unsupervised data analysis is separating relevant and irrelevant information from each other. In this thesis, novel solutions to focusing on more relevant findings are presented. Measurement noise is one source of irrelevant information. If we have several measurements of the same objects, the noise can be suppressed by averaging over the measurements. Simple averaging is, however, only possible when the measurements share a common representation. In this thesis, we show how irrelevant information can be suppressed or ignored also in cases where the measurements come from different kinds of sensors or sources, such as video and audio recordings of the same scene. For combining the measurements, we use mutual dependencies between them. Measures of dependency, such as mutual information, characterize commonalities between two sets of measurements. Two measurements can hence be combined to reduce irrelevant variation by finding new representations for the objects so that the representations are maximally dependent. The combination is optimal, given the assumption that what is in common between the measurements is more relevant than information specific to any one of the sources. Several practical models for the task are introduced. In particular, novel Bayesian generative models, including a Bayesian version of the classical method of canonical correlation analysis, are given. Bayesian modeling is especially justified approach to learning from small data sets. Hence, generative models can be used to extract dependencies in a more reliable manner in, for example, medical applications, where obtaining a large number of samples is difficult. Also, novel non-Bayesian models are presented: Dependent component analysis finds linear projections which capture more general dependencies than earlier methods. Mutual dependencies can also be used for supervising traditional unsupervised learning methods. The learning metrics principle describes how a new distance metric focusing on relevant information can be derived based on the dependency between the measurements and a supervising signal. In this thesis, the approximations and optimization methods required for using the learning metrics principle are improved. en
dc.format.extent Verkkokirja (839 KB, 69 s.)
dc.format.mimetype application/pdf
dc.language.iso en en
dc.publisher Teknillinen korkeakoulu en
dc.relation.haspart [Publication 1]: Arto Klami and Samuel Kaski. 2005. Non-parametric dependent components. In: Proceedings of the 30th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2005). Philadelphia, PA, USA. 18-23 March 2005. Piscataway, NJ, IEEE, pages V-209 - V-212. © 2005 IEEE. By permission. en
dc.relation.haspart [Publication 2]: Abhishek Tripathi, Arto Klami, and Samuel Kaski. 2008. Simple integrative preprocessing preserves what is shared in data sources. BMC Bioinformatics, volume 9, 111. © 2008 by authors. en
dc.relation.haspart [Publication 3]: Arto Klami and Samuel Kaski. 2006. Generative models that discover dependencies between data sets. In: S. McLoone, T. Adali, J. Larsen, and M. Van Hulle (editors). Machine Learning for Signal Processing XVI. Piscataway, NJ, IEEE, pages 123-128. © 2006 IEEE. By permission. en
dc.relation.haspart [Publication 4]: Arto Klami and Samuel Kaski. 2008. Probabilistic approach to detecting dependencies between data sets. Neurocomputing, to appear. © 2008 by authors and © 2008 Elsevier Science. By permission. en
dc.relation.haspart [Publication 5]: Arto Klami and Samuel Kaski. 2007. Local dependent components. In: Zoubin Ghahramani (editor). Proceedings of the 24th International Conference on Machine Learning (ICML 2007). Corvallis, OR, USA. 20-24 June 2007. Madison, WI, Omnipress, pages 425-433. © 2007 by authors. en
dc.relation.haspart [Publication 6]: Jaakko Peltonen, Arto Klami, and Samuel Kaski. 2004. Improved learning of Riemannian metrics for exploratory analysis. Neural Networks, volume 17, numbers 8-9, pages 1087-1100. © 2004 Elsevier Science. By permission. en
dc.relation.haspart [Publication 7]: Samuel Kaski, Janne Sinkkonen, and Arto Klami. 2005. Discriminative clustering. Neurocomputing, volume 69, numbers 1-3, pages 18-41. © 2005 Elsevier Science. By permission. en
dc.relation.haspart [Errata file]: Errata of publication 6 en
dc.subject.other Computer science en
dc.title Modeling of mutual dependencies en
dc.type G5 Artikkeliväitöskirja fi
dc.contributor.department Tietojenkäsittelytieteen laitos fi
dc.subject.keyword canonical correlation analysis en
dc.subject.keyword clustering en
dc.subject.keyword data fusion en
dc.subject.keyword exploratory data analysis en
dc.subject.keyword probabilistic modeling en
dc.subject.keyword learning metrics en
dc.subject.keyword mutual dependency en
dc.subject.keyword mutual information en
dc.identifier.urn URN:ISBN:978-951-22-9520-3
dc.type.dcmitype text en
dc.type.ontasot Väitöskirja (artikkeli) fi
dc.type.ontasot Doctoral dissertation (article-based) en


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search archive


Advanced Search

article-iconSubmit a publication

Browse

My Account