Browsing by Author "Piironen, Juho"
Now showing 1 - 13 of 13
- Results Per Page
- Sort Options
- Bayesian Predictive Inference and Feature Selection for High-Dimensional Data
School of Science | Doctoral dissertation (article-based)(2019) Piironen, JuhoThis thesis discusses Bayesian statistical inference in supervised learning problems where the data are scarce but the number of features large. The focus is on two important tasks. The first one is the prediction of some target variable of interest. The other task is feature selection, where the goal is to identify a small subset of features which are relevant for the prediction. A good predictive accuracy is often intrinsically valuable and a means to understanding the data. Feature selection can further help to make the model easier to interpret and reduce future costs if there is a price associated with predicting with many features. Most traditional approaches try to solve both problems at once by formulating an estimation procedure that performs automatic or semiautomatic feature selection as a by-product of the predictive model fitting. This thesis argues that in many cases one can benefit from a decision theoretically justified two-stage approach. In this approach, one first constructs a model that predicts well but possibly uses many features. In the second stage, one then finds a minimal subset of features that can characterize the predictions of this model. The basic idea of this so called projective framework has been around for a long time but it has largely been overlooked in the statistics and machine learning community. This approach offers plenty of freedom for building an accurate prediction model as one does not need to care about feature selection at this point, and it turns out solving the feature selection problem often becomes substantially easier given an accurate prediction model that can be used as a reference. The thesis focuses mostly on generalized linear models. To solve the problem of predictive model construction, the thesis introduces novel methods for encoding prior information about sparsity and regularization into the model. These methods can in some cases help to improve the prediction accuracy and robustify the posterior inference, but they also advance the current theoretical understanding of the fundamental characteristics of some commonly used prior distributions. The thesis explores also computationally efficient dimension reduction techniques that can be used as shortcuts for predictive model construction when the number of features is very large. Furthermore, the thesis develops the existing projective feature selection method further so as to make the computation fast and accurate for large number of features. Finally, the thesis takes the initial steps towards extending this framework to nonlinear and nonparametric Gaussian process models. The contributions of this thesis are solely methodological, but the benefits of the proposed methods are illustrated using example datasets from various fields, in particular from computational genetics. - Comparison of Bayesian predictive methods for model selection
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2017) Piironen, Juho; Vehtari, AkiThe goal of this paper is to compare several widely used Bayesian model selection methods in practical model selection problems, highlight their differences and give recommendations about the preferred approaches. We focus on the variable subset selection for regression and classification and perform several numerical experiments using both simulated and real world data. The results show that the optimization of a utility estimate such as the cross-validation (CV) score is liable to finding overfitted models due to relatively high variance in the utility estimates when the data is scarce. This can also lead to substantial selection induced bias and optimism in the performance evaluation for the selected model. From a predictive viewpoint, best results are obtained by accounting for model uncertainty by forming the full encompassing model, such as the Bayesian model averaging solution over the candidate models. If the encompassing model is too complex, it can be robustly simplified by the projection method, in which the information of the full model is projected onto the submodels. This approach is substantially less prone to overfitting than selection based on CV-score. Overall, the projection method appears to outperform also the maximum a posteriori model and the selection of the most probable variables. The study also demonstrates that the model selection can greatly benefit from using cross-validation outside the searching process both for guiding the model size selection and assessing the predictive performance of the finally selected model. - Comparison of Bayesian predictive methods for variable selection
Perustieteiden korkeakoulu | Master's thesis(2014-08-19) Piironen, JuhoTo date, several methods for Bayesian model selection have been proposed. Although there are many studies discussing the theoretical properties of these methods for model assessment, an extensive quantitative comparison between the methods for model selection for finite data seems to be lacking. This thesis reviews the most commonly used methods in the literature and compares their performance in practical variable selection problems, especially in situations where the data is scarce. The study also discusses the selection induced bias in detail and underlines its relevance for variable selection. Although the focus of the study is on variable selection, the presented ideas are generalizable to other model selection problems as well. The numerical results consist of simulated experiments and one real world problem. The results suggest that even though there are nearly unbiased methods for assessing the performance of a given model, the high variance in the performance estimation may lead to considerable selection induced bias and selection of an overfitted model. The results also suggest that the reference predictive and projection methods are least sensitive to the selection induced bias and are therefore more robust for searching promising models than the alternative methods, such as cross validation and information criteria. However, due to the selection bias, also for these methods the estimated divergence between the reference and candidate models may be an unreliable indicator of the performance of the selected models. For this reason, the performance estimation of the found models should be done for example using cross validation outside the selection process. - A decision-theoretic approach for model interpretability in Bayesian framework
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2020-09-01) Afrabandpey, Homayun; Peltola, Tomi; Piironen, Juho; Vehtari, Aki; Kaski, SamuelA salient approach to interpretable machine learning is to restrict modeling to simple models. In the Bayesian framework, this can be pursued by restricting the model structure and prior to favor interpretable models. Fundamentally, however, interpretability is about users’ preferences, not the data generation mechanism; it is more natural to formulate interpretability as a utility function. In this work, we propose an interpretability utility, which explicates the trade-off between explanation fidelity and interpretability in the Bayesian framework. The method consists of two steps. First, a reference model, possibly a black-box Bayesian predictive model which does not compromise accuracy, is fitted to the training data. Second, a proxy model from an interpretable model family that best mimics the predictive behaviour of the reference model is found by optimizing the interpretability utility function. The approach is model agnostic—neither the interpretable model nor the reference model are restricted to a certain class of models—and the optimization problem can be solved using standard tools. Through experiments on real-word data sets, using decision trees as interpretable models and Bayesian additive regression models as reference models, we show that for the same level of interpretability, our approach generates more accurate models than the alternative of restricting the prior. We also propose a systematic way to measure stability of interpretabile models constructed by different interpretability approaches and show that our proposed approach generates more stable models. - Implicitly Adaptive Importance Sampling
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2021-02-09) Paananen, Topi; Piironen, Juho; Burkner, Paul-Christian; Vehtari, AkiAdaptive importance sampling is a class of techniques for finding good proposal distributions for importance sampling. Often the proposal distributions are standard probability distributions whose parameters are adapted based on the mismatch between the current proposal and a target distribution. In this work, we present an implicit adaptive importance sampling method that applies to complicated distributions which are not available in closed form. The method iteratively matches the moments of a set of Monte Carlo draws to weighted moments based on importance weights. We apply the method to Bayesian leave-one-out cross-validation and show that it performs better than many existing parametric adaptive importance sampling methods while being computationally inexpensive. - Influence of atmospheric circulation on Antarctic sea ice in climate model HadGEM1: statistical analysis
Perustieteiden korkeakoulu | Bachelor's thesis(2012-05-07) Piironen, Juho - Iterative Supervised Principal Components
A4 Artikkeli konferenssijulkaisussa(2018) Piironen, Juho; Vehtari, AkiIn high-dimensional prediction problems, where the number of features may greatly exceed the number of training instances, fully Bayesian approach with a sparsifying prior is known to produce good results but is computationally challenging. To alleviate this computational burden, we propose to use a preprocessing step where we first apply a dimension reduction to the original data to reduce the number of features to something that is computationally conveniently handled by Bayesian methods. To do this, we propose a new dimension reduction technique, called iterative supervised principal components (ISPCs), which combines variable screening and dimension reduction and can be considered as an extension to the existing technique of supervised principal components (SPCs). Our empirical evaluations confirm that, although not foolproof, the proposed approach provides very good results on several microarray benchmark datasets with very affordable computation time, and can also be very useful for visualizing high-dimensional data. - On the hyperprior choice for the global shrinkage parameter in the horseshoe prior
A4 Artikkeli konferenssijulkaisussa(2017) Piironen, Juho; Vehtari, AkiThe horseshoe prior has proven to be a noteworthy alternative for sparse Bayesian estimation, but as shown in this paper, the results can be sensitive to the prior choice for the global shrinkage hyperparameter. We argue that the previous default choices are dubious due to their tendency to favor solutions with more unshrunk coefficients than we typically expect a priori. This can lead to bad results if this parameter is not strongly identified by data. We derive the relationship between the global parameter and the effective number of nonzeros in the coefficient vector, and show an easy and intuitive way of setting up the prior for the global parameter based on our prior beliefs about the number of nonzero coefficients in the model. The results on real world data show that one can benefit greatly – in terms of improved parameter estimates, prediction accuracy, and reduced computation time – from transforming even a crude guess for the number of nonzero coefficients into the prior for the global parameter using our framework. - Projective inference in high-dimensional problems: Prediction and feature selection
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2020-01-01) Piironen, Juho; Paasiniemi, Markus; Vehtari, AkiThis paper reviews predictive inference and feature selection for generalized linear models with scarce but high-dimensional data. We demonstrate that in many cases one can benefit from a decision theoretically justified two-stage approach: first, construct a possibly non-sparse model that predicts well, and then find a minimal subset of features that characterize the predictions. The model built in the first step is referred to as the reference model and the operation during the latter step as predictive projection. The key characteristic of this approach is that it finds an excellent tradeoff between sparsity and predictive accuracy, and the gain comes from utilizing all available information including prior and that coming from the left out features. We review several methods that follow this principle and provide novel methodological contributions. We present a new projection technique that unifies two existing techniques and is both accurate and fast to compute. We also propose a way of evaluating the feature selection process using fast leave-one-out cross-validation that allows for easy and intuitive model size selection. Furthermore, we prove a theorem that helps to understand the conditions under which the projective approach could be beneficial. The key ideas are illustrated via several experiments using simulated and real world data. - Sparsity information and regularization in the horseshoe and other shrinkage priors
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2017) Piironen, Juho; Vehtari, AkiThe horseshoe prior has proven to be a noteworthy alternative for sparse Bayesian estimation, but has previously suffered from two problems. First, there has been no systematic way of specifying a prior for the global shrinkage hyperparameter based on the prior information about the degree of sparsity in the parameter vector. Second, the horseshoe prior has the undesired property that there is no possibility of specifying separately information about sparsity and the amount of regularization for the largest coefficients, which can be problematic with weakly identified parameters, such as the logistic regression coefficients in the case of data separation. This paper proposes solutions to both of these problems. We introduce a concept of effective number of nonzero parameters, show an intuitive way of formulating the prior for the global hyperparameter based on the sparsity assumptions, and argue that the previous default choices are dubious based on their tendency to favor solutions with more unshrunk parameters than we typically expect a priori. Moreover, we introduce a generalization to the horseshoe prior, called the regularized horseshoe, that allows us to specify a minimum level of regularization to the largest values. We show that the new prior can be considered as the continuous counterpart of the spike-and-slab prior with a finite slab width, whereas the original horseshoe resembles the spike-and-slab with an infinitely wide slab. Numerical experiments on synthetic and real world data illustrate the benefit of both of these theoretical advances. - Uncertainty Quantification for the Horseshoe (with Discussion) comment
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2017-12) Piironen, Juho; Betancourt, Michael; Simpson, Daniel; Vehtari, Aki - Using reference models in variable selection
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2023-03) Pavone, Federico; Piironen, Juho; Bürkner, Paul Christian; Vehtari, AkiVariable selection, or more generally, model reduction is an important aspect of the statistical workflow aiming to provide insights from data. In this paper, we discuss and demonstrate the benefits of using a reference model in variable selection. A reference model acts as a noise-filter on the target variable by modeling its data generating mechanism. As a result, using the reference model predictions in the model selection procedure reduces the variability and improves stability, leading to improved model selection performance. Assuming that a Bayesian reference model describes the true distribution of future data well, the theoretically preferred usage of the reference model is to project its predictive distribution to a reduced model, leading to projection predictive variable selection approach. We analyse how much the great performance of the projection predictive variable is due to the use of reference model and show that other variable selection methods can also be greatly improved by using the reference model as target instead of the original data. In several numerical experiments, we investigate the performance of the projective prediction approach as well as alternative variable selection methods with and without reference models. Our results indicate that the use of reference models generally translates into better and more stable variable selection. - Variable selection for Gaussian processes via sensitivity analysis of the posterior predictive distribution
A4 Artikkeli konferenssijulkaisussa(2019-04-16) Paananen, Topi; Piironen, Juho; Andersen, Michael; Vehtari, AkiVariable selection for Gaussian process models is often done using automatic relevance determination, which uses the inverse lengthscale parameter of each input variable as a proxy for variable relevance. This implicitly determined relevance has several drawbacks that prevent the selection of optimal input variables in terms of predictive performance. To improve on this, we propose two novel variable selection methods for Gaussian process models that utilize the predictions of a full model in the vicinity of the training points and thereby rank the variables based on their predictive relevance. Our empirical results on synthetic and real world data sets demonstrate improved variable selection compared to automatic relevance determination in terms of variability and predictive performance.