Browsing by Author "Heinonen, Markus, Dr., Aalto University, Finland"
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
- Differential Equations for Machine Learning
School of Science | Doctoral dissertation (article-based)(2022) Yıldız, ÇağatayMechanistic models express novel hypotheses for an observed phenomenon by constructing mathematical formulations of causal mechanisms. As opposed to this modeling paradigm, machine learning approaches learn input-output mappings by complicated and often non-interpretable models. While requiring large chunks of data for successful training and downstream performance,the resulting models can come with universal approximation guarantees. Historically, differential equations (DEs) developed in physics, economics, engineering, and numerous other fields have relied on the principles of mechanistic modeling. Despite providing causality and interpretability that machine learning approaches usually lack, mechanistic differential equation models tend tocarry oversimplified assumptions. In this dissertation, we aim to bring these two worlds together by demonstrating how machine learning problems can be tackled by means of differential equations, and how differential equation models can benefit from modern machine learning tools. First, we examine the problems in which mechanistic modeling becomes too difficult, which include the cases with partial knowledge about the observed system and with an excessive number of interactions. Such limitations complicate the process of constructing mathematical descriptions of the phenomenon of interest. To bypass this, we propose to place Gaussian process priors to the time differential and diffusion functions of unknown ordinary (ODEs) and stochastic differential equations (SDEs), and approximate the resulting intractable posterior distribution. We demonstrate that the model can estimate unknown dynamics from sparse and noisy observations. Motivated by the fact that our proposed approach is unable to learn sequences obtained by transforming the ODE states, we develop a new technique that can simultaneously embed the observations into a latent space, and learn an ODE system in the embedding space. Our new model infers the dynamics using Bayesian neural networks for uncertainty handling and more expressive power. We furthermore explicitly decompose the latent space into momentum and position components, which leads to increased predictive performance on a variety of physical tasks.Our next task concerns another problem involving DEs, namely, non-convex optimization. By carefully crafting the drift and diffusion functions of an SDE, we first obtain a stochastic gradient MCMC algorithm. Tuning a temperature variable in the proposed algorithm allows the chain to converge to the global minimum of a non-convex loss surface. We significantly speed up the convergence by using second-order Hessian information in an asynchronous parallel framework. Lastly, we explore how reinforcement learning problems can benefit from neural network based ODE models. In particular, we propose to learn dynamical systems controlled by external actions by a novel, uncertainty-aware neural ODE model. The inferred model, in turn, is utilized for learning optimal policy functions. We illustrate that our method is robust to both noisy and irregularly sampled data sequences, which poses major challenges to traditional methods. - Strengthening nonparametric Bayesian methods with structured kernels
School of Science | Doctoral dissertation (article-based)(2022) Shen, ZheyangThis thesis covers an assortment of topics at the intersection of Bayesian nonparametrics and kernel machines: that is, to propose more efficient, kernel-based solutions to nonparametric Bayesian machine learning tasks. In chronological order, we provide summaries for the 4 publications on 3 interconnected topics: (i) expressive and nonstationary covariance kernels for Gaussian processes (GPs); (ii) scalable approximate inference of GP models via pseudo-inputs; (iii) Bayesian sampling of un-normalized target distributions via the simulation of interacting particle systems (IPSs). GPs are flexible priors on functions, which inform the hypothesis spaces of infinitely wide neural networks. However, to fully exploit their tractable uncertainty measures, careful selection of flexible covariance kernels is required for pattern discovery. Highly parametrized, stationary kernels have been proposed for handling extrapolations in GPs, but the translation invariance implied by stationarity caps their expressiveness. We propose nonstationary generalizations of such expressiveness kernels, both in parametric and nonparametric forms, and explore the implications of those kernels with respect to their spectral properties. Another restrictive aspect of GP models lies upon the cumbersome cubic scaling in their inference. We can draw upon a smaller set of pseudo-inputs, or inducing points, to obtain a sparse and more scalable approximate posteriors. Myriad studies of sparse GPs have established a separation of model parameters, which can either be optimized or inferred, and variational parameters which only requires optimization and no priors. The inducing point locations, however, exist somewhat outside this dichotomy, but the common practice is to simply find point estimates via optimization. We demonstrate that a fully Bayesian treatment of inducing inputs is equally valid in sparse GPs, and leads to a more flexible inferential framework with measurable practical benefits. Lastly, we turn to the sampling of un-normalized densities, a ubiquitous task in Bayesian inference. Apart from Markov Chain Monte Carlo (MCMC) sampling, we can also draw samples by deterministically transporting a set of interacting particles, i.e., the simulation of IPSs. Despite their ostensible differences in mechanism, a duality exists between the subtypes of the two sampling regimes, namely Langevin diffusion (LD) and Stein variational gradient descent (SVGD), where SVGD can be seen as a "kernelized" counterpart of LD. We demonstrate that kernelized, deterministic approximations exist for all diffusion-based MCMCs, which we denote as MCMC dynamics. Drawing upon this extended duality, we obtain deterministic samplers that emulate the behavior of other MCMC diffusion processes.