Machine learning for spectroscopic properties of organic molecules
Loading...
URL
Journal Title
Journal ISSN
Volume Title
School of Science |
Doctoral thesis (article-based)
| Defence date: 2020-08-14
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
2020
Major/Subject
Mcode
Degree programme
Language
en
Pages
106 + app. 58
Series
Aalto University publication series DOCTORAL DISSERTATIONS, 108/2020
Abstract
The efficient design of new and advanced materials is hindered by a shortfall of suitable methods to rapidly and accurately identify potential materials that meet a desired application. Conventional approaches involve either expensive and time-consuming experiments or computations that often require significant human input. The materials design process could be greatly expedited by utilizing artificial intelligence (AI) tools that are capable of learning effectively from known historic or intentionally generated data that is already available for millions of chemical compounds. In this dissertation, we develop and apply machine learning (ML) – a subcategory of AI - to infer spectral properties from molecular datasets. Once trained, our ML models predict molecular spectra and spectral properties instantly and at negligible computational cost. We find that the ML algorithms need to be trained on large and diverse datasets to ensure robustness and predictive accuracy. However, publicly available molecular datasets with realistic structures and spectral properties are rare. Therefore, we generated our own structurally diverse benchmark spectroscopy dataset of 62k large organic molecules. We computed electronic geometries at different levels of density functional theory (DFT) for all 62k molecules as well as quasiparticle orbital eigenvalues at high numerical accuracy with the G0W0 approach for a subset of 5k molecules. A particular difficulty that is often overlooked in current ML applications are model parameters that cannot be learned directly during training, so called hyperparameters. We solve this challenge by applying Bayesian optimization to automatically tune the hyperparameters of our kernel ridge regression (KRR) model with two different descriptors for the molecular structure, one of which introduces its own set of hyperparameters to the method. Furthermore, we study how the performance of our KRR model varies for molecular datasets of different chemical diversity. We find that the learning success of molecular orbital energies inherently depends on the structural complexity of individual molecules as well as on the diversity within a dataset. Our findings benchmark the accuracy of orbital energy predictions with KRR for publicly available molecular datasets, two of which are lesser-known than the widely used QM9 chemical dataset of small molecules. Finally, we employ deep neural network models to predict molecular excitation spectra with up to 97 % accuracy. The results of this dissertation facilitate instant spectra prediction with machine learning for large molecular databases and pave the way for high throughput screening of materials to find new materials with advanced functionality.Description
A doctoral dissertation completed for the degree of Doctor of Science (Technology) to be defended with the permission of Aalto University School of Science, remote connection with the Zoom link
https://aalto.zoom.us/j/68194827869, on 14th August, at 16:00
Supervising professor
Rinke, Patrick, Prof., Aalto University, Department of Applied Physics, FinlandThesis advisor
Todorović, Milica, Dr., Aalto University, Department of Applied Physics, FinlandKeywords
machine learning, spectroscopy, molecular datasets, density functional theory, chemical space, materials design
Other note
Parts
-
[Publication 1]: Annika Stuke, Christian Kunkel, Dorothea Golze, Milica Todorovic´, Johannes T. Margraf, Karsten Reuter, Patrick Rinke, Harald Oberhofer. Atomic structures and orbital energies of 61,489 crystal-forming organic molecules. Scientific Data, 7, 58, February 2020.
DOI: 10.6084/m9.figshare.11689347 View at publisher
- [Publication 2]: Annika Stuke, Patrick Rinke, Milica Todorovic´. Efficient hyperparameter tuning for kernel ridge regression with Bayesian optimization. Submitted to Journal of Chemical Physics, March 2020
-
[Publication 3]: Annika Stuke, Milica Todorovic´, Matthias Rupp, Christian Kunkel, Kunal Ghosh, Lauri Himanen, Patrick Rinke. Chemical diversity in molecular orbital energy predictions with kernel ridge regression. Journal of Chemical Physics, 150, 204121, April 2019.
Full text in Acris/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-201907304553DOI: 10.1063/1.5086105 View at publisher
-
[Publication 4]: Kunal Ghosh, Annika Stuke, Milica Todorovic´, Peter B. Jørgensen, Mikkel N. Schmidt, Aki Vehtari, Patrick Rinke. Deep Learning Spectroscopy: Neural Networks for Molecular Excitation Spectra. Advanced Science, 6, 1801367, January 2019.
Full text in Acris/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-201902251998DOI: 10.1002/advs.201801367 View at publisher