Advances in independent component analysis with applications to data mining
No Thumbnail Available
Doctoral thesis (article-based)
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Dissertations in computer and information science. Report D, 4
AbstractThis thesis considers the problem of finding latent structure in high dimensional data. It is assumed that the observed data are generated by unknown latent variables and their interactions. The task is to find these latent variables and the way they interact, given the observed data only. It is assumed that the latent variables do not depend on each other but act independently. A popular method for solving the above problem is independent component analysis (ICA). It is a statistical method for expressing a set of multidimensional observations as a combination of unknown latent variables that are statistically independent of each other. Starting from ICA, several methods of estimating the latent structure in different problem settings are derived and presented in this thesis. An ICA algorithm for analyzing complex valued signals is given; a way of using ICA in the context of regression is discussed; and an ICA-type algorithm is used for analyzing the topics in dynamically changing text data. In addition to ICA-type methods, two algorithms are given for estimating the latent structure in binary valued data. Experimental results are given on all of the presented methods. Another, partially overlapping problem considered in this thesis is dimensionality reduction. Empirical validation is given on a computationally simple method called random projection: it does not introduce severe distortions in the data. It is also proposed that random projection could be used as a preprocessing method prior to ICA, and experimental results are shown to support this claim. This thesis also contains several literature surveys on various aspects of finding the latent structure in high dimensional data.
independent component analysis, latent variable models, dimensionality reduction, data mining, complex valued signals, random projection, regression, topic identification, 0-1 data
- Bingham E. and Hyvärinen A., 2000. A fast fixed-point algorithm for independent component analysis of complex valued signals. International Journal of Neural Systems 10, No. 1, pages 1-8. [article1.pdf] © 2000 World Scientific Publishing Company. By permission.
- Bingham E. and Mannila H., 2001. Random projection in dimensionality reduction: applications to image and text data. In: Provost F. and Srikant R. (editors), Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001). San Francisco, CA, USA, 26-29 August 2001, pages 245-250. © 2001 Association for Computing Machinery (ACM). By permission.
- Hyvärinen A. and Bingham E., 2003. Connection between multilayer perceptrons and regression using independent component analysis. Neurocomputing 50, pages 211-222. [article3.pdf] © 2003 Elsevier Science. By permission.
- Bingham E., Kabán A. and Girolami M., 2003. Topic identification in dynamical text by complexity pursuit. Neural Processing Letters 17, No. 1, pages 69-83. [article4.pdf] © 2003 Kluwer Academic Publishers. By permission.
- Bingham E., Mannila H. and Seppänen J. K., 2002. Topics in 0-1 data. In: Hand D., Keim D. and Ng R. (editors), Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2002). Edmonton, Alberta, Canada, 23-26 July 2002, pages 450-455. © 2002 Association for Computing Machinery (ACM). By permission.
- Seppänen J. K., Bingham E. and Mannila H., 2003. A simple algorithm for topic identification in 0-1 data. In: Lavrač N., Gamberger D., Todorovski L. and Blockeel H. (editors), Knowledge Discovery in Databases: Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2003). Cavtat - Dubrovnik, Croatia, 22-26 September 2003. Springer-Verlag, Berlin. Lecture Notes in Artificial Intelligence 2838, pages 423-434. [article6.pdf] © 2003 Springer-Verlag. By permission.