Practical approaches to principal component analysis in the presence of missing values

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorIlin, Alexander
dc.contributor.authorRaiko, Tapani
dc.contributor.departmentDepartment of Information and Computer Scienceen
dc.contributor.departmentTietojenkäsittelytieteen laitosfi
dc.contributor.schoolFaculty of Information and Natural Sciencesen
dc.contributor.schoolInformaatio- ja luonnontieteiden tiedekuntafi
dc.date.accessioned2011-11-28T13:18:19Z
dc.date.available2011-11-28T13:18:19Z
dc.date.issued2008
dc.description.abstractPrincipal component analysis (PCA) is a classical data analysis technique that finds linear transformations of data that retain maximal amount of variance. We study a case where some of the data values are missing, and show that this problem has many features which are usually associated with nonlinear models, such as overfitting and bad locally optimal solutions. Probabilistic formulation of PCA provides a good foundation for handling missing values, and we introduce formulas for doing that. In case of high dimensional and very sparse data, overfitting becomes a severe problem and traditional algorithms for PCA are very slow. We introduce a novel fast algorithm and extend it to variational Bayesian learning. Different versions of PCA are compared in artificial experiments, demonstrating the effects of regularization and modeling of posterior variance. The scalability of the proposed algorithm is demonstrated by applying it to the Netflix problem.en
dc.format.extentv, 37
dc.format.mimetypeapplication/pdf
dc.identifier.isbn978-951-22-9482-4
dc.identifier.issn1797-5042
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/876
dc.identifier.urnurn:nbn:fi:tkk-011482
dc.language.isoenen
dc.publisherHelsinki University of Technologyen
dc.publisherTeknillinen korkeakoulufi
dc.relation.ispartofseriesTKK reports in information and computer scienceen
dc.relation.ispartofseries6en
dc.subject.keywordprincipal component analysis (PCA)en
dc.subject.keywordmissing valuesen
dc.subject.keywordoverfittingen
dc.subject.keywordregularizationen
dc.subject.keywordvariational Bayesen
dc.subject.otherMathematicsen
dc.titlePractical approaches to principal component analysis in the presence of missing valuesen
dc.typeD4 Julkaistu kehittämis- tai tutkimusraportti taikka -selvitysfi
dc.type.dcmitypetexten
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
isbn9789512294824.pdf
Size:
598.21 KB
Format:
Adobe Portable Document Format