Stacking for Non-mixing Bayesian Computations: The Curse and Blessing of Multimodal Posteriors
Loading...
Access rights
openAccess
URL
Journal Title
Journal ISSN
Volume Title
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
Authors
Date
2022
Major/Subject
Mcode
Degree programme
Language
en
Pages
45
1-45
1-45
Series
Journal of Machine Learning Research, Volume 23
Abstract
When working with multimodal Bayesian posterior distributions, Markov chain Monte Carlo (MCMC) algorithms have difficulty moving between modes, and default variational or mode-based approximate inferences will understate posterior uncertainty. And, even if the most important modes can be found, it is difficult to evaluate their relative weights in the posterior. Here we propose an approach using parallel runs of MCMC, variational, or mode-based inference to hit as many modes or separated regions as possible and then combine these using Bayesian stacking, a scalable method for constructing a weighted average of distributions. The result from stacking efficiently samples from multimodal posterior distribution, minimizes cross validation prediction error, and represents the posterior uncertainty better than variational inference, but it is not necessarily equivalent, even asymptotically, to fully Bayesian inference. We present theoretical consistency with an example where the stacked inference approximates the true data generating process from the misspecified model and a non-mixing sampler, from which the predictive performance is better than full Bayesian inference, hence the multimodality can be considered a blessing rather than a curse under model misspecification. We demonstrate practical implementation in several model families: latent Dirichlet allocation, Gaussian process regression, hierarchical regression, horseshoe variable selection, and neural networks.Description
Funding Information: We thank the U.S. National Science Foundation, Institute of Education Sciences, Office of Naval Research, Sloan Foundation, and the Academy of Finland Flagship programme: Finnish Center for Artificial Intelligence, FCAI, for partial support of this work. Publisher Copyright: © 2022 Yuling Yao, Aki Vehtari and Andrew Gelman.
Keywords
Bayesian stacking, Markov chain Monte Carlo, model misspecification, multimodal posterior, parallel computation, postprocessing
Other note
Citation
Yao , Y , Vehtari , A & Gelman , A 2022 , ' Stacking for Non-mixing Bayesian Computations: The Curse and Blessing of Multimodal Posteriors ' , Journal of Machine Learning Research , vol. 23 , pp. 1-45 . < https://www.jmlr.org/papers/v23/20-1426.html >