Stacking for Non-mixing Bayesian Computations: The Curse and Blessing of Multimodal Posteriors
Loading...
Access rights
openAccess
publishedVersion
URL
Journal Title
Journal ISSN
Volume Title
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
Authors
Date
Department
Major/Subject
Mcode
Degree programme
Language
en
Pages
45
Series
Journal of Machine Learning Research, Volume 23, pp. 1-45
Abstract
When working with multimodal Bayesian posterior distributions, Markov chain Monte Carlo (MCMC) algorithms have difficulty moving between modes, and default variational or mode-based approximate inferences will understate posterior uncertainty. And, even if the most important modes can be found, it is difficult to evaluate their relative weights in the posterior. Here we propose an approach using parallel runs of MCMC, variational, or mode-based inference to hit as many modes or separated regions as possible and then combine these using Bayesian stacking, a scalable method for constructing a weighted average of distributions. The result from stacking efficiently samples from multimodal posterior distribution, minimizes cross validation prediction error, and represents the posterior uncertainty better than variational inference, but it is not necessarily equivalent, even asymptotically, to fully Bayesian inference. We present theoretical consistency with an example where the stacked inference approximates the true data generating process from the misspecified model and a non-mixing sampler, from which the predictive performance is better than full Bayesian inference, hence the multimodality can be considered a blessing rather than a curse under model misspecification. We demonstrate practical implementation in several model families: latent Dirichlet allocation, Gaussian process regression, hierarchical regression, horseshoe variable selection, and neural networks.Description
Funding Information: We thank the U.S. National Science Foundation, Institute of Education Sciences, Office of Naval Research, Sloan Foundation, and the Academy of Finland Flagship programme: Finnish Center for Artificial Intelligence, FCAI, for partial support of this work. Publisher Copyright: © 2022 Yuling Yao, Aki Vehtari and Andrew Gelman.
Other note
Citation
Yao, Y, Vehtari, A & Gelman, A 2022, 'Stacking for Non-mixing Bayesian Computations: The Curse and Blessing of Multimodal Posteriors', Journal of Machine Learning Research, vol. 23, pp. 1-45. < https://www.jmlr.org/papers/v23/20-1426.html >