Data exploration with learning metrics

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorPeltonen, Jaakko
dc.contributor.departmentDepartment of Computer Science and Engineeringen
dc.contributor.departmentTietotekniikan osastofi
dc.contributor.labLaboratory of Computer and Information Scienceen
dc.contributor.labInformaatiotekniikan laboratoriofi
dc.date.accessioned2012-02-13T13:04:42Z
dc.date.available2012-02-13T13:04:42Z
dc.date.issued2004-11-17
dc.description.abstractA crucial problem in exploratory analysis of data is that it is difficult for computational methods to focus on interesting aspects of data. Traditional methods of unsupervised learning cannot differentiate between interesting and noninteresting variation, and hence may model, visualize, or cluster parts of data that are not interesting to the analyst. This wastes the computational power of the methods and may mislead the analyst. In this thesis, a principle called "learning metrics" is used to develop visualization and clustering methods that automatically focus on the interesting aspects, based on auxiliary labels supplied with the data samples. The principle yields non-Euclidean (Riemannian) metrics that are data-driven, widely applicable, versatile, invariant to many transformations, and in part invariant to noise. Learning metric methods are introduced for five tasks: nonlinear visualization by Self-Organizing Maps and Multidimensional Scaling, linear projection, and clustering of discrete data and multinomial distributions. The resulting methods either explicitly estimate distances in the Riemannian metric, or optimize a tailored cost function which is implicitly related to such a metric. The methods have rigorous theoretical relationships to information geometry and probabilistic modeling, and are empirically shown to yield good practical results in exploratory and information retrieval tasks.en
dc.description.versionrevieweden
dc.format.extent103, [109]
dc.format.mimetypeapplication/pdf
dc.identifier.isbn951-22-7345-4
dc.identifier.issn1459-7020
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/2479
dc.identifier.urnurn:nbn:fi:tkk-004420
dc.language.isoenen
dc.publisherHelsinki University of Technologyen
dc.publisherTeknillinen korkeakoulufi
dc.relation.haspartSamuel Kaski, Janne Sinkkonen, and Jaakko Peltonen, 2001. Bankruptcy analysis with self-organizing maps in learning metrics. IEEE Transactions on Neural Networks 12, number 4, pages 936-947. [article1.pdf] © 2001 IEEE. By permission.
dc.relation.haspartJaakko Peltonen, Arto Klami, and Samuel Kaski, 2002. Learning more accurate metrics for Self-Organizing Maps. In: José R. Dorronsoro (editor), Proceedings of the International Conference on Artificial Neural Networks (ICANN 2002). Madrid, Spain, 27-30 August 2002. Berlin, Springer-Verlag. Lecture Notes in Computer Science 2415, pages 999-1004. [article2.pdf] © 2002 Springer-Verlag. By permission.
dc.relation.haspartJaakko Peltonen, Janne Sinkkonen, and Samuel Kaski, 2002. Discriminative clustering of text documents. In: Lipo Wang, Jagath C. Rajapakse, Kunihiko Fukushima, Soo-Young Lee, and Xin Yao (editors), Proceedings of the 9th International Conference on Neural Information Processing (ICONIP'02). Singapore, 18-22 November 2002. Piscataway, NJ, IEEE, volume 4, pages 1956-1960. [article3.pdf] © 2002 IEEE. By permission.
dc.relation.haspartJarkko Venna, Samuel Kaski, and Jaakko Peltonen, 2003. Visualizations for assessing convergence and mixing of MCMC. In: Nada Lavrač, Dragan Gamberger, Ljupco Todorovski, and Hendrik Blockeel (editors), Proceedings of the 14th European Conference on Machine Learning (ECML 2003). Cavtat - Dubrovnik, Croatia, 22-26 September 2003. Berlin, Springer-Verlag. Lecture Notes in Artificial Intelligence 2837, pages 432-443. [article4.pdf] © 2003 Springer-Verlag. By permission.
dc.relation.haspartSamuel Kaski and Jaakko Peltonen, 2003. Informative discriminant analysis. In: Tom Fawcett and Nina Mishra (editors), Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003). Washington DC, USA, 21-24 August 2003. Menlo Park, CA, AAAI Press, pages 329-336. [article5.pdf] © 2003 American Association for Artificial Intelligence (AAAI). By permission.
dc.relation.haspartJaakko Peltonen, Janne Sinkkonen, and Samuel Kaski, 2004. Sequential information bottleneck for finite data. In: Russ Greiner and Dale Schuurmans (editors), Proceedings of the Twenty-First International Conference on Machine Learning (ICML 2004). Banff, Canada, 4-8 July 2004. Madison, WI, Omnipress, pages 647-654. [article6.pdf] © 2004 by authors.
dc.relation.haspartJaakko Peltonen, Arto Klami, and Samuel Kaski. Improved learning of Riemannian metrics for exploratory analysis. Neural Networks, accepted for publication. [article7.pdf] © 2004 by authors and © 2004 Elsevier Science. By permission.
dc.relation.haspartJaakko Peltonen and Samuel Kaski. Discriminative components of data. IEEE Transactions on Neural Networks, accepted for publication. [article8.pdf] © 2004 IEEE. By permission.
dc.relation.ispartofseriesDissertations in computer and information science. Report Den
dc.relation.ispartofseries7en
dc.subject.keywordclusteringen
dc.subject.keyworddata miningen
dc.subject.keywordexploratory data analysisen
dc.subject.keywordlearning metricsen
dc.subject.keywordsupervisionen
dc.subject.keywordvisualizationen
dc.subject.otherComputer scienceen
dc.titleData exploration with learning metricsen
dc.typeG5 Artikkeliväitöskirjafi
dc.type.dcmitypetexten
dc.type.ontasotVäitöskirja (artikkeli)fi
dc.type.ontasotDoctoral dissertation (article-based)en
local.aalto.digiauthask
local.aalto.digifolderAalto_67102

Files

Original bundle

Now showing 1 - 9 of 9
No Thumbnail Available
Name:
isbn9512273454.pdf
Size:
3.92 MB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
article1.pdf
Size:
991.55 KB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
article2.pdf
Size:
129.84 KB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
article3.pdf
Size:
363.25 KB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
article4.pdf
Size:
4.67 MB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
article5.pdf
Size:
368.56 KB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
article6.pdf
Size:
227.49 KB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
article7.pdf
Size:
371.02 KB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
article8.pdf
Size:
366.39 KB
Format:
Adobe Portable Document Format