Learning metrics and discriminative clustering

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en
dc.contributor.author Sinkkonen, Janne
dc.date.accessioned 2012-02-10T09:06:47Z
dc.date.available 2012-02-10T09:06:47Z
dc.date.issued 2003-11-21
dc.identifier.isbn 951-22-6797-7
dc.identifier.issn 1459-7020
dc.identifier.uri https://aaltodoc.aalto.fi/handle/123456789/2136
dc.description.abstract In this work methods have been developed to extract relevant information from large, multivariate data sets in a flexible, nonlinear way. The techniques are applicable especially at the initial, explorative phase of data analysis, in cases where an explicit indicator of relevance is available as part of the data set. The unsupervised learning methods, popular in data exploration, often rely on a distance measure defined for data items. Selection of the distance measure, part of which is feature selection, is therefore fundamentally important. The learning metrics principle is introduced to complement manual feature selection by enabling automatic modification of a distance measure on the basis of available relevance information. Two applications of the principle are developed. The first emphasizes relevant aspects of the data by directly modifying distances between data items, and is usable, for example, in information visualization with the self-organizing maps. The other method, discriminative clustering, finds clusters that are internally homogeneous with respect to the interesting variation of the data. The techniques have been applied to text document analysis, gene expression clustering, and charting the bankruptcy sensitivity of companies. In the first, more straightforward approach, a new local metric of the data space measures changes in the conditional distribution of the relevance-indicating data by the Fisher information matrix, a local approximation of the Kullback-Leibler distance. Discriminative clustering, on the other hand, directly minimizes a Kullback-Leibler based distortion measure within the clusters, or equivalently maximizes the mutual information between the clusters and the relevance indicator. A finite-data algorithm for discriminative clustering is also presented. It maximizes a partially marginalized posterior probability of the model and is asymptotically equivalent to maximizing mutual information. en
dc.format.extent 77, [86]
dc.format.mimetype application/pdf
dc.language.iso en en
dc.publisher Helsinki University of Technology en
dc.publisher Teknillinen korkeakoulu fi
dc.relation.ispartofseries Dissertations in computer and information science. Report D en
dc.relation.ispartofseries 2 en
dc.relation.haspart Kaski S. and Sinkkonen J., 2000. Metrics that learn relevance. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN-2000). IEEE, Piscataway, NJ, Vol. 5, pages 547-552. [article1.pdf] © 2000 IEEE. By permission.
dc.relation.haspart Sinkkonen J. and Kaski S., 2000. Clustering by similarity in an auxiliary space. In: Proceedings of the Second International Conference on Intelligent Data Engineering and Automated Learning (IDEAL 2000). Springer-Verlag, London, pages 3-8. [article2.pdf] © 2000 Springer-Verlag. By permission.
dc.relation.haspart Kaski S., Sinkkonen J. and Peltonen J., 2001. Bankruptcy analysis with self-organizing maps in learning metrics. IEEE Transactions on Neural Networks 12, No. 4, pages 936-947.
dc.relation.haspart Kaski S. and Sinkkonen J., 2001. A topography-preserving latent variable model with learning metrics. In: Allinson N., Yin H., Allinson L. and Slack J. (editors), Advances in Self-Organizing Maps. Springer-Verlag, London, pages 224-229. [article4.pdf] © 2001 Springer-Verlag. By permission.
dc.relation.haspart Sinkkonen J. and Kaski S., 2002. Clustering based on conditional distributions in an auxiliary space. Neural Computation 14, pages 217-239. [article5.pdf] © 2002 MIT Press. By permission.
dc.relation.haspart Kaski S. and Sinkkonen J., Principle of learning metrics for exploratory data analysis. The Journal of VLSI Signal Processing – Systems for Signal, Image, and Video Technology: Special issue on Data Mining and Biomedical Applications of Neural Networks, forthcoming. [article6.pdf] © 2003 by authors and © 2003 Kluwer Academic Publishers. By permission.
dc.relation.haspart Sinkkonen J., Kaski S. and Nikkilä J., 2002. Discriminative clustering: optimal contingency tables by learning metrics. In: Elomaa T., Mannila H. and Toivonen H. (editors), Proceedings of the 13th European Conference on Machine Learning (ECML'02). Springer-Verlag, London, pages 418-430. [article7.pdf] © 2002 Springer-Verlag. By permission.
dc.relation.haspart Peltonen J., Sinkkonen J. and Kaski S., 2002. Discriminative clustering of text documents. In: Wang L., Rajapakse J. C., Fukushima K., Lee S.-Y. and Yao X. (editors), Proceedings of the 9th International Conference on Neural Information Processing (ICONIP'02). IEEE, Piscataway, NJ, Vol. 4, pages 1956-1960. [article8.pdf] © 2002 IEEE. By permission.
dc.subject.other Computer science en
dc.title Learning metrics and discriminative clustering en
dc.type G5 Artikkeliväitöskirja fi
dc.description.version reviewed en
dc.contributor.department Department of Computer Science and Engineering en
dc.contributor.department Tietotekniikan osasto fi
dc.subject.keyword clustering en
dc.subject.keyword discriminative clustering en
dc.subject.keyword exploratory data analysis en
dc.subject.keyword feature extraction en
dc.subject.keyword information bottleneck en
dc.subject.keyword information geometry en
dc.subject.keyword learning metrics en
dc.subject.keyword mutual information en
dc.subject.keyword supervised unsupervised learning en
dc.identifier.urn urn:nbn:fi:tkk-001045
dc.type.dcmitype text en
dc.type.ontasot Väitöskirja (artikkeli) fi
dc.type.ontasot Doctoral dissertation (article-based) en
dc.contributor.lab Laboratory of Computer and Information Science en
dc.contributor.lab Informaatiotekniikan laboratorio fi


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search archive


Advanced Search

article-iconSubmit a publication

Browse