Practical approaches to principal component analysis in the presence of missing values
dc.contributor | Aalto-yliopisto | fi |
dc.contributor | Aalto University | en |
dc.contributor.author | Ilin, Alexander | |
dc.contributor.author | Raiko, Tapani | |
dc.contributor.department | Department of Information and Computer Science | en |
dc.contributor.department | Tietojenkäsittelytieteen laitos | fi |
dc.contributor.school | Faculty of Information and Natural Sciences | en |
dc.contributor.school | Informaatio- ja luonnontieteiden tiedekunta | fi |
dc.date.accessioned | 2011-11-28T13:18:19Z | |
dc.date.available | 2011-11-28T13:18:19Z | |
dc.date.issued | 2008 | |
dc.description.abstract | Principal component analysis (PCA) is a classical data analysis technique that finds linear transformations of data that retain maximal amount of variance. We study a case where some of the data values are missing, and show that this problem has many features which are usually associated with nonlinear models, such as overfitting and bad locally optimal solutions. Probabilistic formulation of PCA provides a good foundation for handling missing values, and we introduce formulas for doing that. In case of high dimensional and very sparse data, overfitting becomes a severe problem and traditional algorithms for PCA are very slow. We introduce a novel fast algorithm and extend it to variational Bayesian learning. Different versions of PCA are compared in artificial experiments, demonstrating the effects of regularization and modeling of posterior variance. The scalability of the proposed algorithm is demonstrated by applying it to the Netflix problem. | en |
dc.format.extent | v, 37 | |
dc.format.mimetype | application/pdf | |
dc.identifier.isbn | 978-951-22-9482-4 | |
dc.identifier.issn | 1797-5042 | |
dc.identifier.uri | https://aaltodoc.aalto.fi/handle/123456789/876 | |
dc.identifier.urn | urn:nbn:fi:tkk-011482 | |
dc.language.iso | en | en |
dc.publisher | Helsinki University of Technology | en |
dc.publisher | Teknillinen korkeakoulu | fi |
dc.relation.ispartofseries | TKK reports in information and computer science | en |
dc.relation.ispartofseries | 6 | en |
dc.subject.keyword | principal component analysis (PCA) | en |
dc.subject.keyword | missing values | en |
dc.subject.keyword | overfitting | en |
dc.subject.keyword | regularization | en |
dc.subject.keyword | variational Bayes | en |
dc.subject.other | Mathematics | en |
dc.title | Practical approaches to principal component analysis in the presence of missing values | en |
dc.type | D4 Julkaistu kehittämis- tai tutkimusraportti taikka -selvitys | fi |
dc.type.dcmitype | text | en |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- isbn9789512294824.pdf
- Size:
- 598.21 KB
- Format:
- Adobe Portable Document Format