### Browsing by Author "Gutmann, Michael U."

Now showing 1 - 7 of 7

###### Results Per Page

###### Sort Options

Item Bayesian inference of atomistic structure in functional materials(Nature Publishing Group, 2019-03-18) Todorović, Milica; Gutmann, Michael U.; Corander, Jukka; Rinke, Patrick; Department of Applied Physics; Computational Electronic Structure TheoryTailoring the functional properties of advanced organic/inorganic heterogeneous devices to their intended technological applications requires knowledge and control of the microscopic structure inside the device. Atomistic quantum mechanical simulation methods deliver accurate energies and properties for individual configurations, however, finding the most favourable configurations remains computationally prohibitive. We propose a ‘building block’-based Bayesian Optimisation Structure Search (BOSS) approach for addressing extended organic/inorganic interface problems and demonstrate its feasibility in a molecular surface adsorption study. In BOSS, a Bayesian model identifies material energy landscapes in an accelerated fashion from atomistic configurations sampled during active learning. This allowed us to identify several most favourable molecular adsorption configurations for C 60 on the (101) surface of TiO 2 anatase and clarify the key molecule-surface interactions governing structural assembly. Inferred structures were in good agreement with detailed experimental images of this surface adsorbate, demonstrating good predictive power of BOSS and opening the route towards large-scale surface adsorption studies of molecular aggregates and films.Item Bayesian optimization for likelihood-free inference of simulator-based statistical models(MICROTOME PUBL, 2016-08-01) Gutmann, Michael U.; Corander, Jukka; Department of Computer Science; Centre of Excellence in Computational Inference, COIN; Professorship Kaski Samuel; Helsinki Institute for Information Technology (HIIT); Myllymäki Petri group (HIIT)Our paper deals with inferring simulator-based statistical models given some observed data. A simulator-based model is a parametrized mechanism which specifies how data are generated. It is thus also referred to as generative model. We assume that only a finite number of parameters are of interest and allow the generative process to be very general; it may be a noisy nonlinear dynamical system with an unrestricted number of hidden variables. This weak assumption is useful for devising realistic models but it renders statistical inference very difficult. The main challenge is the intractability of the likelihood function. Several likelihood-free inference methods have been proposed which share the basic idea of identifying the parameters by finding values for which the discrepancy between simulated and observed data is small. A major obstacle to using these methods is their computational cost. The cost is largely due to the need to repeatedly simulate data sets and the lack of knowledge about how the parameters affect the discrepancy. We propose a strategy which combines probabilistic modeling of the discrepancy with optimization to facilitate likelihood-free inference. The strategy is implemented using Bayesian optimization and is shown to accelerate the inference through a reduction in the number of required simulations by several orders of magnitude.Item Efficient acquisition rules for model-based approximate Bayesian computation(INT SOC BAYESIAN ANALYSIS, 2019-06) Järvenpää, Marko; Gutmann, Michael U.; Pleska, Arijus; Vehtari, Aki; Marttinen, Pekka; Department of Computer Science; Probabilistic Machine Learning; Helsinki Institute for Information Technology (HIIT); Professorship Kaski Samuel; Centre of Excellence in Computational Inference, COIN; Professorship Vehtari Aki; Professorship Marttinen P.; Department of Computer Science; University of EdinburghApproximate Bayesian computation (ABC) is a method for Bayesian inference when the likelihood is unavailable but simulating from the model is possible. However, many ABC algorithms require a large number of simulations, which can be costly. To reduce the computational cost, Bayesian optimisation (BO) and surrogate models such as Gaussian processes have been proposed. Bayesian optimisation enables one to intelligently decide where to evaluate the model next but common BO strategies are not designed for the goal of estimating the posterior distribution. Our paper addresses this gap in the literature. We propose to compute the uncertainty in the ABC posterior density, which is due to a lack of simulations to estimate this quantity accurately, and define a loss function that measures this uncertainty. We then propose to select the next evaluation location to minimise the expected loss. Experiments show that the proposed method often produces the most accurate approximations as compared to common BO strategies.Item ELFI: Engine for likelihood-free inference(2018-08-01) Lintusaari, Jarno; Vuollekoski, Henri; Kangasrääsiö, Antti; Skytén, Kusti; Järvenpää, Marko; Marttinen, Pekka; Gutmann, Michael U.; Vehtari, Aki; Corander, Jukka; Kaski, Samuel; Department of Computer Science; Centre of Excellence in Computational Inference, COIN; Professorship Kaski Samuel; Helsinki Institute for Information Technology (HIIT); Probabilistic Machine Learning; Professorship Marttinen P.; Professorship Vehtari AkiEngine for Likelihood-Free Inference (ELFI) is a Python software library for performing likelihood-free inference (LFI). ELFI provides a convenient syntax for arranging components in LFI, such as priors, simulators, summaries or distances, to a network called ELFI graph. The components can be implemented in a wide variety of languages. The stand-alone ELFI graph can be used with any of the available inference methods without modifications. A central method implemented in ELFI is Bayesian Optimization for Likelihood-Free Inference (BOLFI), which has recently been shown to accelerate likelihood-free inference up to several orders of magnitude by surrogate-modelling the distance. ELFI also has an inbuilt support for output data storing for reuse and analysis, and supports parallelization of computation from multiple cores up to a cluster environment. ELFI is designed to be extensible and provides interfaces for widening its functionality. This makes the adding of new inference methods to ELFI straightforward and automatically compatible with the inbuilt features.Item Gaussian process modelling in approximate bayesian computation to estimate horizontal gene transfer in Bacteria(2018-12-01) Järvenpää, Marko; Gutmann, Michael U.; Vehtari, A. K.I.; Marttinen, Pekka; Probabilistic Machine Learning; University of Edinburgh; Professorship Marttinen P.; Department of Computer ScienceApproximate Bayesian computation (ABC) can be used for model fitting when the likelihood function is intractable but simulating from the model is feasible. However, even a single evaluation of a complex model may take several hours, limiting the number of model evaluations available. Modelling the discrepancy between the simulated and observed data using a Gaussian process (GP) can be used to reduce the number of model evaluations required by ABC, but the sensitivity of this approach to a specific GP formulation has not yet been thoroughly investigated. We begin with a comprehensive empirical evaluation of using GPs in ABC, including various transformations of the discrepancies and two novel GP formulations. Our results indicate the choice of GP may significantly affect the accuracy of the estimated posterior distribution. Selection of an appropriate GP model is thus important. We formulate expected utility to measure the accuracy of classifying discrepancies below or above the ABC threshold, and show that itcan be used to automate the GP model selection step. Finally, based on the understanding gained with toy examples, we fit a population genetic model for bacteria, providing insight into horizontal gene transfer events within the population and from external origins.Item Likelihood-free inference via classification(2018) Gutmann, Michael U.; Dutta, Ritabrata; Kaski, Samuel; Corander, Jukka; Department of Computer Science; Centre of Excellence in Computational Inference, COIN; Professorship Kaski Samuel; Helsinki Institute for Information Technology (HIIT); Probabilistic Machine LearningIncreasingly complex generative models are being used across disciplines as they allow for realistic characterization of data, but a common difficulty with them is the prohibitively large computational cost to evaluate the likelihood function and thus to perform likelihood-based statistical inference. A likelihood-free inference framework has emerged where the parameters are identified by finding values that yield simulated data resembling the observed data. While widely applicable, a major difficulty in this framework is how to measure the discrepancy between the simulated and observed data. Transforming the original problem into a problem of classifying the data into simulated versus observed, we find that classification accuracy can be used to assess the discrepancy. The complete arsenal of classification methods becomes thereby available for inference of intractable generative models. We validate our approach using theory and simulations for both point estimation and Bayesian inference, and demonstrate its use on real data by inferring an individual-based epidemiological model for bacterial infections in child care centers.Item Resolving outbreak dynamics using approximate bayesian computation for stochastic birth–death models(Wellcome Trust, 2019) Lintusaari, Jarno; Blomstedt, Paul; Rose, Brittany; Sivula, Tuomas; Gutmann, Michael U.; Kaski, Samuel; Corander, Jukka; Department of Computer Science; Professorship Kaski Samuel; Probabilistic Machine Learning; Helsinki Institute for Information Technology (HIIT); Professorship Vehtari Aki; Finnish Center for Artificial Intelligence, FCAI; University of Helsinki; University of Edinburgh; University of OsloEarlier research has suggested that approximate Bayesian computation (ABC) makes it possible to fit simulator-based intractable birth–death models to investigate communicable disease outbreak dynamics with accuracy comparable to that of exact Bayesian methods. However, recent findings have indicated that key parameters, such as the reproductive number R, may remain poorly identifiable with these models. Here we show that this identifiability issue can be resolved by taking into account disease-specific characteristics of the transmission process in closer detail. Using tuberculosis (TB) in the San Francisco Bay area as a case study, we consider a model that generates genotype data from a mixture of three stochastic processes, each with its own distinct dynamics and clear epidemiological interpretation. We show that our model allows for accurate posterior inferences about outbreak dynamics from aggregated annual case data with genotype information. As a byproduct of the inference, the model provides an estimate of the infectious population size at the time the data were collected. The acquired estimate is approximately two orders of magnitude smaller than assumed in earlier related studies, and it is much better aligned with epidemiological knowledge about active TB prevalence. Similarly, the reproductive number R related to the primary underlying transmission process is estimated to be nearly three times larger than previous estimates, which has a substantial impact on the interpretation of the fitted outbreak model.