Learning metrics and discriminative clustering

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorSinkkonen, Janne
dc.contributor.departmentDepartment of Computer Science and Engineeringen
dc.contributor.departmentTietotekniikan osastofi
dc.contributor.labLaboratory of Computer and Information Scienceen
dc.contributor.labInformaatiotekniikan laboratoriofi
dc.date.accessioned2012-02-10T09:06:47Z
dc.date.available2012-02-10T09:06:47Z
dc.date.issued2003-11-21
dc.description.abstractIn this work methods have been developed to extract relevant information from large, multivariate data sets in a flexible, nonlinear way. The techniques are applicable especially at the initial, explorative phase of data analysis, in cases where an explicit indicator of relevance is available as part of the data set. The unsupervised learning methods, popular in data exploration, often rely on a distance measure defined for data items. Selection of the distance measure, part of which is feature selection, is therefore fundamentally important. The learning metrics principle is introduced to complement manual feature selection by enabling automatic modification of a distance measure on the basis of available relevance information. Two applications of the principle are developed. The first emphasizes relevant aspects of the data by directly modifying distances between data items, and is usable, for example, in information visualization with the self-organizing maps. The other method, discriminative clustering, finds clusters that are internally homogeneous with respect to the interesting variation of the data. The techniques have been applied to text document analysis, gene expression clustering, and charting the bankruptcy sensitivity of companies. In the first, more straightforward approach, a new local metric of the data space measures changes in the conditional distribution of the relevance-indicating data by the Fisher information matrix, a local approximation of the Kullback-Leibler distance. Discriminative clustering, on the other hand, directly minimizes a Kullback-Leibler based distortion measure within the clusters, or equivalently maximizes the mutual information between the clusters and the relevance indicator. A finite-data algorithm for discriminative clustering is also presented. It maximizes a partially marginalized posterior probability of the model and is asymptotically equivalent to maximizing mutual information.en
dc.description.versionrevieweden
dc.format.extent77, [86]
dc.format.mimetypeapplication/pdf
dc.identifier.isbn951-22-6797-7
dc.identifier.issn1459-7020
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/2136
dc.identifier.urnurn:nbn:fi:tkk-001045
dc.language.isoenen
dc.publisherHelsinki University of Technologyen
dc.publisherTeknillinen korkeakoulufi
dc.relation.haspartKaski S. and Sinkkonen J., 2000. Metrics that learn relevance. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN-2000). IEEE, Piscataway, NJ, Vol. 5, pages 547-552. [article1.pdf] © 2000 IEEE. By permission.
dc.relation.haspartSinkkonen J. and Kaski S., 2000. Clustering by similarity in an auxiliary space. In: Proceedings of the Second International Conference on Intelligent Data Engineering and Automated Learning (IDEAL 2000). Springer-Verlag, London, pages 3-8. [article2.pdf] © 2000 Springer-Verlag. By permission.
dc.relation.haspartKaski S., Sinkkonen J. and Peltonen J., 2001. Bankruptcy analysis with self-organizing maps in learning metrics. IEEE Transactions on Neural Networks 12, No. 4, pages 936-947.
dc.relation.haspartKaski S. and Sinkkonen J., 2001. A topography-preserving latent variable model with learning metrics. In: Allinson N., Yin H., Allinson L. and Slack J. (editors), Advances in Self-Organizing Maps. Springer-Verlag, London, pages 224-229. [article4.pdf] © 2001 Springer-Verlag. By permission.
dc.relation.haspartSinkkonen J. and Kaski S., 2002. Clustering based on conditional distributions in an auxiliary space. Neural Computation 14, pages 217-239. [article5.pdf] © 2002 MIT Press. By permission.
dc.relation.haspartKaski S. and Sinkkonen J., Principle of learning metrics for exploratory data analysis. The Journal of VLSI Signal Processing – Systems for Signal, Image, and Video Technology: Special issue on Data Mining and Biomedical Applications of Neural Networks, forthcoming. [article6.pdf] © 2003 by authors and © 2003 Kluwer Academic Publishers. By permission.
dc.relation.haspartSinkkonen J., Kaski S. and Nikkilä J., 2002. Discriminative clustering: optimal contingency tables by learning metrics. In: Elomaa T., Mannila H. and Toivonen H. (editors), Proceedings of the 13th European Conference on Machine Learning (ECML'02). Springer-Verlag, London, pages 418-430. [article7.pdf] © 2002 Springer-Verlag. By permission.
dc.relation.haspartPeltonen J., Sinkkonen J. and Kaski S., 2002. Discriminative clustering of text documents. In: Wang L., Rajapakse J. C., Fukushima K., Lee S.-Y. and Yao X. (editors), Proceedings of the 9th International Conference on Neural Information Processing (ICONIP'02). IEEE, Piscataway, NJ, Vol. 4, pages 1956-1960. [article8.pdf] © 2002 IEEE. By permission.
dc.relation.ispartofseriesDissertations in computer and information science. Report Den
dc.relation.ispartofseries2en
dc.subject.keywordclusteringen
dc.subject.keyworddiscriminative clusteringen
dc.subject.keywordexploratory data analysisen
dc.subject.keywordfeature extractionen
dc.subject.keywordinformation bottlenecken
dc.subject.keywordinformation geometryen
dc.subject.keywordlearning metricsen
dc.subject.keywordmutual informationen
dc.subject.keywordsupervised unsupervised learningen
dc.subject.otherComputer scienceen
dc.titleLearning metrics and discriminative clusteringen
dc.typeG5 Artikkeliväitöskirjafi
dc.type.dcmitypetexten
dc.type.ontasotVäitöskirja (artikkeli)fi
dc.type.ontasotDoctoral dissertation (article-based)en
local.aalto.digiauthask
local.aalto.digifolderAalto_63819
Files
Original bundle
Now showing 1 - 8 of 8
No Thumbnail Available
Name:
isbn9512267977.pdf
Size:
588.36 KB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
article1.pdf
Size:
213.32 KB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
article2.pdf
Size:
105.57 KB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
article4.pdf
Size:
107.99 KB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
article5.pdf
Size:
257.49 KB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
article6.pdf
Size:
257.04 KB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
article7.pdf
Size:
193.82 KB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
article8.pdf
Size:
85.14 KB
Format:
Adobe Portable Document Format