Robust large-scale statistical inference and ICA using bootstrapping
No Thumbnail Available
URL
Journal Title
Journal ISSN
Volume Title
School of Electrical Engineering |
Doctoral thesis (article-based)
| Defence date: 2018-06-15
Authors
Date
2018
Major/Subject
Mcode
Degree programme
Language
en
Pages
72 + app. 50
Series
Aalto University publication series DOCTORAL DISSERTATIONS, 107/2018
Abstract
The reliability of the information extracted from large-scale data, as well as the validity of data-driven decisions depend on the veracity of the data and the utilized data processing methods. Quantification of the veracity of parameter estimates or data-driven decisions is required in order to make appropriate choices of estimators and identifying redundant or irrelevant variables in multi-variate data settings. Moreover, quantification of the veracity allows efficient usage of available resources by processing only as much data as is needed to achieve a desired level of accuracy or confidence. Statistical inference such as finding the accuracy of certain parameter estimates and testing hypotheses on model parameters can be used to quantify the veracity of large-scale data analytics results. In this thesis, versatile bootstrap procedures are developed for performing statistical inference on large-scale data. First, a computationally efficient and statistically robust bootstrap procedure is proposed, which is scalable to smaller distinct subsets of data. Hence, the proposed method is compatible with distributed storage systems and parallel computing architectures. The statistical convergence and robustness properties of the method are analytically established. Then, two specific low-complexity bootstrap procedures are proposed for performing statistical inference on the mixing coefficients of the Independent Component Analysis (ICA) model. Such statistical inferences are required to identify the contribution of a specific source signal-of-interest onto the observed mixture variables. This thesis establishes significant analytical results on the structure of the FastICA estimator, which enable the computation of bootstrap replicas in closed-form. This not only saves computational resources, but also avoids convergence problems, permutation and sign ambiguities of the FastICA algorithm. The developed methods enable statistical inference in a variety of applications in which ICA is commonly applied, e.g., fMRI and EEG signal processing. In the thesis, an alternative derivation of the fixed-point FastICA algorithm is established. The derivation provides a better understanding of how the FastICA algorithm is derived from the exact Newton-Raphson (NR) algorithm. In the original derivation, FastICA was derived as an approximate NR algorithm using unjustified assumptions, which are not required in the alternative derivation presented in this thesis. It is well known that the fixed-point FastICA algorithm has severe convergence problems when the dimensionality of the data and the sample size are of the same order. To mitigate this problem, a power iteration algorithm for FastICA is proposed, which is remarkably more stable than the fixed-point FastICA algorithm. The proposed PowerICA algorithm can be run in parallel on two computing nodes making it considerably faster to compute.Description
Supervising professor
Ollila, Esa, Prof., Aalto University, Department of Signal Processing and Acoustics, FinlandThesis advisor
Ollila, Esa, Prof., Aalto University, Department of Signal Processing and Acoustics, FinlandKoivunen, Visa, Prof., Aalto University, Department of Signal Processing and Acoustics, Finland
Keywords
Big data analytics, Bootstrap, Fast and Robustg Bootstrap, Distributed and parallel computation, Robust estimation, Independent Component Analysis, FastICA
Other note
Parts
-
[Publication 1]: S. Basiri, E. Ollila, V. Koivunen. Fast and robust bootstrap in analyzing large multivariate datasets. In 48th Asilomar Conference on Signals, Systems and Computers, Pacific Grove, USA, pp. 8-13, November 2014.
DOI: 10.1109/ACSSC.2014.7094385 View at publisher
-
[Publication 2]: S. Basiri, E. Ollila, V. Koivunen. Robust, Scalable, and Fast Bootstrap Method for Analyzing Large Scale Data. IEEE Transactions on Signal Processing, vol. 64(4), pp. 1007-1017, February 2015.
DOI: 10.1109/TSP.2015.2498121 View at publisher
-
[Publication 3]: S. Basiri, E. Ollila, V. Koivunen. Alternative derivation of FastICA with novel power iteration algorithm. IEEE Signal Processing Letters, vol. 24(9), pp. 1378-1382, July 2017.
DOI: 10.1109/LSP.2017.2732342 View at publisher
-
[Publication 4]: S. Basiri, E. Ollila, V. Koivunen. Fast and robust bootstrap method for testing hypotheses in the ICA model. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Florence, Italy, pp. 6-10, May 2014.
DOI: 10.1109/ICASSP.2014.6853547 View at publisher
-
[Publication 5]: S. Basiri, E. Ollila, V. Koivunen. Enhanced bootstrap method for statistical inference in the ICA model. Signal Processing, vol. 138, 2017, pp. 53-62, March 2017.
DOI: 10.1016/j.sigpro.2017.03.005 View at publisher