Strengthening nonparametric Bayesian methods with structured kernels

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
School of Science | Doctoral thesis (article-based) | Defence date: 2022-11-22
Degree programme
70 + app. 54
Aalto University publication series DOCTORAL THESES, 158/2022
This thesis covers an assortment of topics at the intersection of Bayesian nonparametrics and kernel machines: that is, to propose more efficient, kernel-based solutions to nonparametric Bayesian machine learning tasks. In chronological order, we provide summaries for the 4 publications on 3 interconnected topics: (i) expressive and nonstationary covariance kernels for Gaussian processes (GPs); (ii) scalable approximate inference of GP models via pseudo-inputs; (iii) Bayesian sampling of un-normalized target distributions via the simulation of interacting particle systems (IPSs). GPs are flexible priors on functions, which inform the hypothesis spaces of infinitely wide neural networks. However, to fully exploit their tractable uncertainty measures, careful selection of flexible covariance kernels is required for pattern discovery. Highly parametrized, stationary kernels have been proposed for handling extrapolations in GPs, but the translation invariance implied by stationarity caps their expressiveness. We propose nonstationary generalizations of such expressiveness kernels, both in parametric and nonparametric forms, and explore the implications of those kernels with respect to their spectral properties. Another restrictive aspect of GP models lies upon the cumbersome cubic scaling in their inference. We can draw upon a smaller set of pseudo-inputs, or inducing points, to obtain a sparse and more scalable approximate posteriors. Myriad studies of sparse GPs have established a separation of model parameters, which can either be optimized or inferred, and variational parameters which only requires optimization and no priors. The inducing point locations, however, exist somewhat outside this dichotomy, but the common practice is to simply find point estimates via optimization. We demonstrate that a fully Bayesian treatment of inducing inputs is equally valid in sparse GPs, and leads to a more flexible inferential framework with measurable practical benefits. Lastly, we turn to the sampling of un-normalized densities, a ubiquitous task in Bayesian inference. Apart from Markov Chain Monte Carlo (MCMC) sampling, we can also draw samples by deterministically transporting a set of interacting particles, i.e., the simulation of IPSs. Despite their ostensible differences in mechanism, a duality exists between the subtypes of the two sampling regimes, namely Langevin diffusion (LD) and Stein variational gradient descent (SVGD), where SVGD can be seen as a "kernelized" counterpart of LD. We demonstrate that kernelized, deterministic approximations exist for all diffusion-based MCMCs, which we denote as MCMC dynamics. Drawing upon this extended duality, we obtain deterministic samplers that emulate the behavior of other MCMC diffusion processes.
Supervising professor
Kaski, Samuel, Prof., Aalto University, Department of Computer Science, Finland
Thesis advisor
Heinonen, Markus, Dr., Aalto University, Finland
Bayesian nonparametrics, kernel methods, Gaussian processes
  • [Publication 1]: Zheyang Shen, Markus Heinonen, Samuel Kaski. Harmonizable mixture kernels with variational Fourier features. In The 22nd InternationalConference on Artificial Intelligence and Statistics, Naha, Okinawa, Japan, PMLR, p. 1812-1821 (Proceedings of Machine Learning Research;vol. 89), April 2019.
    Full text in Acris/Aaltodoc:
  • [Publication 2]: Zheyang Shen, Markus Heinonen, Samuel Kaski. Learning spectrograms with convolutional spectral kernels. In The 23rd InternationalConference on Artificial Intelligence and Statistics, Palermo, Italy, PMLR, p. 3826-3836 (Proceedings of Machine Learning Research; vol. 108),August 2020.
    Full text in Acris/Aaltodoc:
  • [Publication 3]: Simone Rossi, Markus Heinonen, Edwin V. Bonilla, Zheyang Shen, Maurizio Filippone. Sparse Gaussian processes revisited: Bayesian approaches to inducing-variable approximations. In The 24th International Conference on Artificial Intelligence and Statistics, San Diego, California, USA, PMLR, p. 1837-1845 (Proceedings of Machine Learning Research; vol. 130), April 2021.
    Full text in Acris/Aaltodoc:
  • [Publication 4]: Zheyang Shen, Markus Heinonen, Samuel Kaski. De-randomizing MCMC dynamics with the diffusion Stein operator. In The 35th Conferenceon Neural Information Processing Systems, Online, December 2021.
    Full text in Acris/Aaltodoc: