Efficient structure search with multi-task Bayesian optimization

Loading...
Thumbnail Image
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu | Master's thesis
Date
2023-05-15
Department
Major/Subject
Machine Learning, Data Science and Artificial Intelligence
Mcode
SCI3044
Degree programme
Master’s Programme in Computer, Communication and Information Sciences
Language
en
Pages
64
Series
Abstract
Computational materials science aims to discover new functional materials and optimize their properties, which often includes resource-intensive calculations. To address structure search tasks with the least number of expensive calculations, the Bayesian Optimization Structure Search (BOSS) algorithm has been implemented. BOSS applies active learning in combination with Gaussian process regression to sample-efficiently optimize a target function, which in this case represents the total energy of the material. Materials can be simulated with approximated methods which are fast but less accurate or with costly and accurate electronic structure methods. This work investigates how BOSS can become even more resource-efficient by incorporating calculations from different levels of accuracy. Multi-fidelity BOSS uses the Intrinsic Model of Coregionalization (ICM) to integrate data from different atomistic simulators, all focusing on the same objective, the total energy of the material. This work focuses on multi-fidelity learning acquisition functions, which are one of the key components of the multi-fidelity algorithm. In particular, I developed and implemented several multi-fidelity acquisition functions. To test the functions, I applied multi-fidelity BOSS on the alanine structure search task, where I used simulations of the alanine system based on force fields (AMBER18), density-functional theory (FHI-aims with PBE-exchange correlation functional) and quantum chemistry accuracy (Gaussian16 with CCSD(T)). I found that multi-fidelity BOSS reduced the CPU cost by up to 90% CPU when used with the ELCB or MES acquisition functions. Both acquisition functions enable large savings when used in combination with different separable or inseparable sampling strategies. I also found, that the possible savings depends significantly on the sampling costs of the atomistic simulators, the correlation between the different fidelities and the dimension of the search space.
Description
Supervisor
Rinke, Patrick
Thesis advisor
Todorović, Milica
Keywords
multi-task, learning, Bayesian, optimization
Other note
Citation