Application of alpha-Divergence for Stochastic Neighbor Embedding in Data Visualization

No Thumbnail Available
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu | Master's thesis
Machine Learning and Data Mining
Degree programme
Master’s Programme in Machine Learning and Data Mining (Macadamia)
66 + 2
Dimensionality reduction and information visualization are fundamental steps in data processing, information extraction and reasoning. In real-world applications, the number of measurements or variables per a single observation is so large that handling the raw data in a specific problem such as regression or classification becomes infeasible or even impractical. Moreover, in many applications, a faithful representation of the data for a first step analysis and hypothesis development becomes crucial. Recently, the SNE method has become tremendously popular for data visualization and feature extraction. The more recent algorithms such as t-SNE and HSSNE extend the basic SNE algorithm by considering general heavy-tailed distributions in the low-dimensional space, while the others, such as NeRV, consider different parameterized cost functions to achieve the desired embedding by tuning the parameter. In this thesis, we provide another extension to the SNE method by investigating the properties of alpha-divergence for neighbor embedding, focusing our attention on a particular range of alpha values. We show that alpha-divergence, with a proper selection of the alpha parameter effectively eliminates the crowding problem associated with the early methods. However, we also provide the extensions of our method to distributions having heavier tail than Gaussian. Contrary to some earlier methods like HSSNE and NeRV, no hand-tuning is needed, but we can rigorously estimate the optimal value of alpha for given input data. For this, we provide a statistical framework using a novel distribution called Exponential Divergence with Augmentation. This is an approximate generalization of Tweedie distribution and enables alpha-optimization after a nonlinear transformation. We evaluate the performance of our proposed method by considering two sets of experiments: first, we provide a number of visualizations using our method and its extensions and compare the results with the earlier methods. Second, we conduct a set of experiments to confirm the effectiveness of our alpha-optimization method for finding the optimal alpha for the data distribution, and its consistency with standard quality measures of dimensionality reduction.
Oja, Erkki
Thesis advisor
Dikmen, Onur
dimensionality reduction, information visualization, stochastic neighbor embedding, alpha-divergence, exponential divergence with augmentation.
Other note