Advances in independent component analysis with applications to data mining

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorBingham, Ella
dc.contributor.departmentDepartment of Computer Science and Engineeringen
dc.contributor.departmentTietotekniikan osastofi
dc.contributor.labLaboratory of Computer and Information Scienceen
dc.contributor.labInformaatiotekniikan laboratoriofi
dc.date.accessioned2012-02-10T09:08:24Z
dc.date.available2012-02-10T09:08:24Z
dc.date.issued2003-12-12
dc.description.abstractThis thesis considers the problem of finding latent structure in high dimensional data. It is assumed that the observed data are generated by unknown latent variables and their interactions. The task is to find these latent variables and the way they interact, given the observed data only. It is assumed that the latent variables do not depend on each other but act independently. A popular method for solving the above problem is independent component analysis (ICA). It is a statistical method for expressing a set of multidimensional observations as a combination of unknown latent variables that are statistically independent of each other. Starting from ICA, several methods of estimating the latent structure in different problem settings are derived and presented in this thesis. An ICA algorithm for analyzing complex valued signals is given; a way of using ICA in the context of regression is discussed; and an ICA-type algorithm is used for analyzing the topics in dynamically changing text data. In addition to ICA-type methods, two algorithms are given for estimating the latent structure in binary valued data. Experimental results are given on all of the presented methods. Another, partially overlapping problem considered in this thesis is dimensionality reduction. Empirical validation is given on a computationally simple method called random projection: it does not introduce severe distortions in the data. It is also proposed that random projection could be used as a preprocessing method prior to ICA, and experimental results are shown to support this claim. This thesis also contains several literature surveys on various aspects of finding the latent structure in high dimensional data.en
dc.description.versionrevieweden
dc.format.extent60, [66]
dc.format.mimetypeapplication/pdf
dc.identifier.isbn951-22-6820-5
dc.identifier.issn1459-7020
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/2141
dc.identifier.urnurn:nbn:fi:tkk-001098
dc.language.isoenen
dc.publisherHelsinki University of Technologyen
dc.publisherTeknillinen korkeakoulufi
dc.relation.haspartBingham E. and Hyvärinen A., 2000. A fast fixed-point algorithm for independent component analysis of complex valued signals. International Journal of Neural Systems 10, No. 1, pages 1-8. [article1.pdf] © 2000 World Scientific Publishing Company. By permission.
dc.relation.haspartBingham E. and Mannila H., 2001. Random projection in dimensionality reduction: applications to image and text data. In: Provost F. and Srikant R. (editors), Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2001). San Francisco, CA, USA, 26-29 August 2001, pages 245-250. © 2001 Association for Computing Machinery (ACM). By permission.
dc.relation.haspartHyvärinen A. and Bingham E., 2003. Connection between multilayer perceptrons and regression using independent component analysis. Neurocomputing 50, pages 211-222. [article3.pdf] © 2003 Elsevier Science. By permission.
dc.relation.haspartBingham E., Kabán A. and Girolami M., 2003. Topic identification in dynamical text by complexity pursuit. Neural Processing Letters 17, No. 1, pages 69-83. [article4.pdf] © 2003 Kluwer Academic Publishers. By permission.
dc.relation.haspartBingham E., Mannila H. and Seppänen J. K., 2002. Topics in 0-1 data. In: Hand D., Keim D. and Ng R. (editors), Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2002). Edmonton, Alberta, Canada, 23-26 July 2002, pages 450-455. © 2002 Association for Computing Machinery (ACM). By permission.
dc.relation.haspartSeppänen J. K., Bingham E. and Mannila H., 2003. A simple algorithm for topic identification in 0-1 data. In: Lavrač N., Gamberger D., Todorovski L. and Blockeel H. (editors), Knowledge Discovery in Databases: Proceedings of the 7th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2003). Cavtat - Dubrovnik, Croatia, 22-26 September 2003. Springer-Verlag, Berlin. Lecture Notes in Artificial Intelligence 2838, pages 423-434. [article6.pdf] © 2003 Springer-Verlag. By permission.
dc.relation.ispartofseriesDissertations in computer and information science. Report Den
dc.relation.ispartofseries4en
dc.subject.keywordindependent component analysisen
dc.subject.keywordlatent variable modelsen
dc.subject.keyworddimensionality reductionen
dc.subject.keyworddata miningen
dc.subject.keywordcomplex valued signalsen
dc.subject.keywordrandom projectionen
dc.subject.keywordregressionen
dc.subject.keywordtopic identificationen
dc.subject.keyword0-1 dataen
dc.subject.otherComputer scienceen
dc.titleAdvances in independent component analysis with applications to data miningen
dc.typeG5 Artikkeliväitöskirjafi
dc.type.dcmitypetexten
dc.type.ontasotVäitöskirja (artikkeli)fi
dc.type.ontasotDoctoral dissertation (article-based)en
local.aalto.digiauthask
local.aalto.digifolderAalto_65098
Files
Original bundle
Now showing 1 - 5 of 5
No Thumbnail Available
Name:
isbn9512268205.pdf
Size:
2.83 MB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
article1.pdf
Size:
220.45 KB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
article3.pdf
Size:
707.04 KB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
article4.pdf
Size:
152.29 KB
Format:
Adobe Portable Document Format
No Thumbnail Available
Name:
article6.pdf
Size:
171.5 KB
Format:
Adobe Portable Document Format