Differential Equations for Machine Learning

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.advisorLähdesmäki, Harri, Prof., Aalto University, Finland
dc.contributor.advisorHeinonen, Markus, Dr., Aalto University, Finland
dc.contributor.authorYıldız, Çağatay
dc.contributor.departmentTietotekniikan laitosfi
dc.contributor.departmentDepartment of Computer Scienceen
dc.contributor.labComputational Systems Biologyen
dc.contributor.schoolPerustieteiden korkeakoulufi
dc.contributor.schoolSchool of Scienceen
dc.contributor.supervisorLähdesmäki, Harri, Prof., Aalto University, Department of Computer Science, Finland
dc.date.accessioned2022-01-14T10:00:09Z
dc.date.available2022-01-14T10:00:09Z
dc.date.defence2022-02-18
dc.date.issued2022
dc.descriptionDefence is held on 18.2.2022 12:15 – 16:15 (Zoom), https://aalto.zoom.us/j/61873808631
dc.description.abstractMechanistic models express novel hypotheses for an observed phenomenon by constructing mathematical formulations of causal mechanisms. As opposed to this modeling paradigm, machine learning approaches learn input-output mappings by complicated and often non-interpretable models. While requiring large chunks of data for successful training and downstream performance,the resulting models can come with universal approximation guarantees. Historically, differential equations (DEs) developed in physics, economics, engineering, and numerous other fields have relied on the principles of mechanistic modeling. Despite providing causality and interpretability that machine learning approaches usually lack, mechanistic differential equation models tend tocarry oversimplified assumptions. In this dissertation, we aim to bring these two worlds together by demonstrating how machine learning problems can be tackled by means of differential equations, and how differential equation models can benefit from modern machine learning tools. First, we examine the problems in which mechanistic modeling becomes too difficult, which include the cases with partial knowledge about the observed system and with an excessive number of interactions. Such limitations complicate the process of constructing mathematical descriptions of the phenomenon of interest. To bypass this, we propose to place Gaussian process priors to the time differential and diffusion functions of unknown ordinary (ODEs) and stochastic differential equations (SDEs), and approximate the resulting intractable posterior distribution. We demonstrate that the model can estimate unknown dynamics from sparse and noisy observations. Motivated by the fact that our proposed approach is unable to learn sequences obtained by transforming the ODE states, we develop a new technique that can simultaneously embed the observations into a latent space, and learn an ODE system in the embedding space. Our new model infers the dynamics using Bayesian neural networks for uncertainty handling and more expressive power. We furthermore explicitly decompose the latent space into momentum and position components, which leads to increased predictive performance on a variety of physical tasks.Our next task concerns another problem involving DEs, namely, non-convex optimization. By carefully crafting the drift and diffusion functions of an SDE, we first obtain a stochastic gradient MCMC algorithm. Tuning a temperature variable in the proposed algorithm allows the chain to converge to the global minimum of a non-convex loss surface. We significantly speed up the convergence by using second-order Hessian information in an asynchronous parallel framework. Lastly, we explore how reinforcement learning problems can benefit from neural network based ODE models. In particular, we propose to learn dynamical systems controlled by external actions by a novel, uncertainty-aware neural ODE model. The inferred model, in turn, is utilized for learning optimal policy functions. We illustrate that our method is robust to both noisy and irregularly sampled data sequences, which poses major challenges to traditional methods.en
dc.format.extent106 + app. 82
dc.format.mimetypeapplication/pdfen
dc.identifier.isbn978-952-64-0666-4 (electronic)
dc.identifier.isbn978-952-64-0665-7 (printed)
dc.identifier.issn1799-4942 (electronic)
dc.identifier.issn1799-4934 (printed)
dc.identifier.issn1799-4934 (ISSN-L)
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/112291
dc.identifier.urnURN:ISBN:978-952-64-0666-4
dc.language.isoenen
dc.opnEk, Carl Henrik, Prof., University of Cambridge, UK
dc.publisherAalto Universityen
dc.publisherAalto-yliopistofi
dc.relation.haspart[Publication 1]: Markus Heinonen, Çagatay Yıldız, Henrik Mannerström, Jukka Intosalmi, Harri Lähdesmäki. Learning Unknown ODE Models with Gaussian Processes. In International Conference on Machine Learning, Stockholm, pages 1959–1968, vol.80, July 2018. Full text in Acris/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-201907304456.
dc.relation.haspart[Publication 2]: Çagatay Yıldız, Markus Heinonen, Jukka Intosalmi, Henrik Mannerström, Harri Lähdesmäki. Learning Stochastic Differential Equations with Gaussian Processes without Gradient Matching. In 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP), Aalborg, September 2018. DOI: 10.1109/MLSP.2018.8516991
dc.relation.haspart[Publication 3]: Çagatay Yıldız, Markus Heinonen, Harri Lähdesmäki. ODE2VAE: Deep Generative Second Order ODEs with Bayesian Neural Networks. In Neural Information Processing Systems, Vancouver, pages 13412–13421, vol.32, December 2019
dc.relation.haspart[Publication 4]: Umut Simsekli, Çagatay Yıldız, Than Huy Nguyen, Taylan Cemgil and Gael Richard. Asynchronous Stochastic Quasi-Newton MCMC for Non-Convex Optimization. In International Conference on Machine Learning, Stockholm, pages 4674–4683, vol.80, July 2018. Full ext in Acris/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-201907304448.
dc.relation.haspart[Publication 5]: Çagatay Yıldız, Markus Heinonen, Harri Lähdesmäki. Continuous-Time Model-Based Reinforcement Learning. In International Conference on Machine Learning, Virtual, pages 12009–12018, vol.139, July 2021. Full text in Acris/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-202108258419.
dc.relation.ispartofseriesAalto University publication series DOCTORAL THESESen
dc.relation.ispartofseries7/2022
dc.revKlami, Arto, Prof., University of Helsinki, Finland
dc.revWahlström, Niklas, Prof., Uppsala University, Sweden
dc.subject.keywordmachine learningen
dc.subject.keyworddifferential equationsen
dc.subject.keywordneural networksen
dc.subject.keywordGaussian processesen
dc.subject.otherComputer scienceen
dc.titleDifferential Equations for Machine Learningen
dc.typeG5 Artikkeliväitöskirjafi
dc.type.dcmitypetexten
dc.type.ontasotDoctoral dissertation (article-based)en
dc.type.ontasotVäitöskirja (artikkeli)fi
local.aalto.acrisexportstatuschecked 2022-02-21_1216
local.aalto.archiveyes
local.aalto.formfolder2022_01_13_klo_15_33
local.aalto.infraScience-IT
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
isbn9789526406664.pdf
Size:
14.8 MB
Format:
Adobe Portable Document Format