Learning metrics and discriminative clustering

No Thumbnail Available
Journal Title
Journal ISSN
Volume Title
Doctoral thesis (article-based)
Checking the digitized thesis and permission for publishing
Instructions for the author
Degree programme
77, [86]
Dissertations in computer and information science. Report D, 2
In this work methods have been developed to extract relevant information from large, multivariate data sets in a flexible, nonlinear way. The techniques are applicable especially at the initial, explorative phase of data analysis, in cases where an explicit indicator of relevance is available as part of the data set. The unsupervised learning methods, popular in data exploration, often rely on a distance measure defined for data items. Selection of the distance measure, part of which is feature selection, is therefore fundamentally important. The learning metrics principle is introduced to complement manual feature selection by enabling automatic modification of a distance measure on the basis of available relevance information. Two applications of the principle are developed. The first emphasizes relevant aspects of the data by directly modifying distances between data items, and is usable, for example, in information visualization with the self-organizing maps. The other method, discriminative clustering, finds clusters that are internally homogeneous with respect to the interesting variation of the data. The techniques have been applied to text document analysis, gene expression clustering, and charting the bankruptcy sensitivity of companies. In the first, more straightforward approach, a new local metric of the data space measures changes in the conditional distribution of the relevance-indicating data by the Fisher information matrix, a local approximation of the Kullback-Leibler distance. Discriminative clustering, on the other hand, directly minimizes a Kullback-Leibler based distortion measure within the clusters, or equivalently maximizes the mutual information between the clusters and the relevance indicator. A finite-data algorithm for discriminative clustering is also presented. It maximizes a partially marginalized posterior probability of the model and is asymptotically equivalent to maximizing mutual information.
clustering, discriminative clustering, exploratory data analysis, feature extraction, information bottleneck, information geometry, learning metrics, mutual information, supervised unsupervised learning
Other note
  • Kaski S. and Sinkkonen J., 2000. Metrics that learn relevance. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN-2000). IEEE, Piscataway, NJ, Vol. 5, pages 547-552. [article1.pdf] © 2000 IEEE. By permission.
  • Sinkkonen J. and Kaski S., 2000. Clustering by similarity in an auxiliary space. In: Proceedings of the Second International Conference on Intelligent Data Engineering and Automated Learning (IDEAL 2000). Springer-Verlag, London, pages 3-8. [article2.pdf] © 2000 Springer-Verlag. By permission.
  • Kaski S., Sinkkonen J. and Peltonen J., 2001. Bankruptcy analysis with self-organizing maps in learning metrics. IEEE Transactions on Neural Networks 12, No. 4, pages 936-947.
  • Kaski S. and Sinkkonen J., 2001. A topography-preserving latent variable model with learning metrics. In: Allinson N., Yin H., Allinson L. and Slack J. (editors), Advances in Self-Organizing Maps. Springer-Verlag, London, pages 224-229. [article4.pdf] © 2001 Springer-Verlag. By permission.
  • Sinkkonen J. and Kaski S., 2002. Clustering based on conditional distributions in an auxiliary space. Neural Computation 14, pages 217-239. [article5.pdf] © 2002 MIT Press. By permission.
  • Kaski S. and Sinkkonen J., Principle of learning metrics for exploratory data analysis. The Journal of VLSI Signal Processing – Systems for Signal, Image, and Video Technology: Special issue on Data Mining and Biomedical Applications of Neural Networks, forthcoming. [article6.pdf] © 2003 by authors and © 2003 Kluwer Academic Publishers. By permission.
  • Sinkkonen J., Kaski S. and Nikkilä J., 2002. Discriminative clustering: optimal contingency tables by learning metrics. In: Elomaa T., Mannila H. and Toivonen H. (editors), Proceedings of the 13th European Conference on Machine Learning (ECML'02). Springer-Verlag, London, pages 418-430. [article7.pdf] © 2002 Springer-Verlag. By permission.
  • Peltonen J., Sinkkonen J. and Kaski S., 2002. Discriminative clustering of text documents. In: Wang L., Rajapakse J. C., Fukushima K., Lee S.-Y. and Yao X. (editors), Proceedings of the 9th International Conference on Neural Information Processing (ICONIP'02). IEEE, Piscataway, NJ, Vol. 4, pages 1956-1960. [article8.pdf] © 2002 IEEE. By permission.
Permanent link to this item