Supervised Probability Preserving Projection (SPPP)

No Thumbnail Available
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu | Master's thesis
Machine Learning and Data Mining
Degree programme
Master’s Programme in Machine Learning and Data Mining (Macadamia)
Dimensionality Reduction (DR) is the process of finding a reduced representation of a data set according to some defined criteria. DR may be performed in both unsupervised and supervised settings. Several techniques have been proposed in the literature for unsupervised DR, where the aim is usually to preserve some intrinsic characteristics of the data without using the output information. In most cases they are preferred as a preprocessing step while some may end up with clustering or visualizations. While much focus has been on unsupervised methods, supervised techniques are preferred when every sample has its output information. Even though obtaining this information may be expensive for some tasks, this supports supervised methods in trying to avoid the curse of dimensionality where the space may be sparse. The output information allows the methods to focus on each points’ real neighbors unlike unsupervised methods. In this thesis we aim to develop a supervised DR technique called Supervised Probability Preserving Projection (SPPP) that operates on probabilistic relations between points. More specifically we learn a linear transformation matrix that maps the input samples on to a projection space where the differences between the probabilistic similarities of the input covariates and their responses are minimized, given a neighborhood function. This thesis begins by suggesting three probabilistic neighborhood functions for a recently proposed method called Supervised Distance Preserving Projections (SDPP). Motivations from the experimental results on synthetic examples leads to the development and introduction of a novel technique called Supervised Probability Preserving Projection (SPPP). The formulation of SPPP and optimizations for three versions namely Gaussian, Heavy-tail and Linear are presented. The experiments indicate competitive performance of SPPP compared to recent state-of-the-art methods suggesting its use for both regression and classification tasks alike.
Karhunen, Juha
Thesis advisor
Corona, Francesco
dimensionality reduction, SPPP, SDPP, probabilities, pairwise distances
Other note