Browsing by Author "Stuke, Annika"
Now showing 1 - 5 of 5
- Results Per Page
- Sort Options
- Atomic structures and orbital energies of 61,489 crystal-forming organic molecules
Data Article(2020-12-01) Stuke, Annika; Kunkel, Christian; Golze, Dorothea; Todorović, Milica; Margraf, Johannes T.; Reuter, Karsten; Rinke, Patrick; Oberhofer, HaraldData science and machine learning in materials science require large datasets of technologically relevant molecules or materials. Currently, publicly available molecular datasets with realistic molecular geometries and spectral properties are rare. We here supply a diverse benchmark spectroscopy dataset of 61,489 molecules extracted from organic crystals in the Cambridge Structural Database (CSD), denoted OE62. Molecular equilibrium geometries are reported at the Perdew-Burke-Ernzerhof (PBE) level of density functional theory (DFT) including van der Waals corrections for all 62 k molecules. For these geometries, OE62 supplies total energies and orbital eigenvalues at the PBE and the PBE hybrid (PBE0) functional level of DFT for all 62 k molecules in vacuum as well as at the PBE0 level for a subset of 30,876 molecules in (implicit) water. For 5,239 molecules in vacuum, the dataset provides quasiparticle energies computed with many-body perturbation theory in the G0W0 approximation with a PBE0 starting point (denoted GW5000 in analogy to the GW100 benchmark set (M. van Setten et al. J. Chem. Theory Comput. 12, 5076 (2016))). - Chemical diversity in molecular orbital energy predictions with kernel ridge regression
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2019-05-28) Stuke, Annika; Todorović, Milica; Rupp, Matthias; Kunkel, Christian; Ghosh, Kunal; Himanen, Lauri; Rinke, PatrickInstant machine learning predictions of molecular properties are desirable for materials design, but the predictive power of the methodology is mainly tested on well-known benchmark datasets. Here, we investigate the performance of machine learning with kernel ridge regression (KRR) for the prediction of molecular orbital energies on three large datasets: the standard QM9 small organic molecules set, amino acid and dipeptide conformers, and organic crystal-forming molecules extracted from the Cambridge Structural Database. We focus on the prediction of highest occupied molecular orbital (HOMO) energies, computed at the density-functional level of theory. Two different representations that encode the molecular structure are compared: the Coulomb matrix (CM) and the many-body tensor representation (MBTR). We find that KRR performance depends significantly on the chemistry of the underlying dataset and that the MBTR is superior to the CM, predicting HOMO energies with a mean absolute error as low as 0.09 eV. To demonstrate the power of our machine learning method, we apply our model to structures of 10k previously unseen molecules. We gain instant energy predictions that allow us to identify interesting molecules for future applications. - Deep Learning Spectroscopy: Neural Networks for Molecular Excitation Spectra
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2019-05-03) Ghosh, Kunal; Stuke, Annika; Todorović, Milica; Jørgensen, Peter Bjørn; Schmidt, Mikkel N.; Vehtari, Aki; Rinke, PatrickDeep learning methods for the prediction of molecular excitation spectra are presented. For the example of the electronic density of states of 132k organic molecules, three different neural network architectures: multilayer perceptron (MLP), convolutional neural network (CNN), and deep tensor neural network (DTNN) are trained and assessed. The inputs for the neural networks are the coordinates and charges of the constituent atoms of each molecule. Already, the MLP is able to learn spectra, but the root mean square error (RMSE) is still as high as 0.3 eV. The learning quality improves significantly for the CNN (RMSE = 0.23 eV) and reaches its best performance for the DTNN (RMSE = 0.19 eV). Both CNN and DTNN capture even small nuances in the spectral shape. In a showcase application of this method, the structures of 10k previously unseen organic molecules are scanned and instant spectra predictions are obtained to identify molecules for potential applications. - Efficient hyperparameter tuning for kernel ridge regression with Bayesian optimization
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2021-09) Stuke, Annika; Rinke, Patrick; Todorovic, MilicaMachine learning methods usually depend on internal parameters-so called hyperparameters-that need to be optimized for best performance. Such optimization poses a burden on machine learning practitioners, requiring expert knowledge, intuition or computationally demanding brute-force parameter searches. We here assess three different hyperparameter selection methods: Grid search, random search and an efficient automated optimization technique based on Bayesian optimization (BO). We apply these methods to a machine learning problem based on kernel ridge regression in computational chemistry. Two different descriptors are employed to represent the atomic structure of organic molecules, one of which introduces its own set of hyperparameters to the method. We identify optimal hyperparameter configurations and infer entire prediction error landscapes in hyperparameter space that serve as visual guides for the hyperparameter performance. We further demonstrate that for an increasing number of hyperparameters, BO and random search become significantly more efficient in computational time than an exhaustive grid search, while delivering an equivalent or even better accuracy. - Machine learning for spectroscopic properties of organic molecules
School of Science | Doctoral dissertation (article-based)(2020) Stuke, AnnikaThe efficient design of new and advanced materials is hindered by a shortfall of suitable methods to rapidly and accurately identify potential materials that meet a desired application. Conventional approaches involve either expensive and time-consuming experiments or computations that often require significant human input. The materials design process could be greatly expedited by utilizing artificial intelligence (AI) tools that are capable of learning effectively from known historic or intentionally generated data that is already available for millions of chemical compounds. In this dissertation, we develop and apply machine learning (ML) – a subcategory of AI - to infer spectral properties from molecular datasets. Once trained, our ML models predict molecular spectra and spectral properties instantly and at negligible computational cost. We find that the ML algorithms need to be trained on large and diverse datasets to ensure robustness and predictive accuracy. However, publicly available molecular datasets with realistic structures and spectral properties are rare. Therefore, we generated our own structurally diverse benchmark spectroscopy dataset of 62k large organic molecules. We computed electronic geometries at different levels of density functional theory (DFT) for all 62k molecules as well as quasiparticle orbital eigenvalues at high numerical accuracy with the G0W0 approach for a subset of 5k molecules. A particular difficulty that is often overlooked in current ML applications are model parameters that cannot be learned directly during training, so called hyperparameters. We solve this challenge by applying Bayesian optimization to automatically tune the hyperparameters of our kernel ridge regression (KRR) model with two different descriptors for the molecular structure, one of which introduces its own set of hyperparameters to the method. Furthermore, we study how the performance of our KRR model varies for molecular datasets of different chemical diversity. We find that the learning success of molecular orbital energies inherently depends on the structural complexity of individual molecules as well as on the diversity within a dataset. Our findings benchmark the accuracy of orbital energy predictions with KRR for publicly available molecular datasets, two of which are lesser-known than the widely used QM9 chemical dataset of small molecules. Finally, we employ deep neural network models to predict molecular excitation spectra with up to 97 % accuracy. The results of this dissertation facilitate instant spectra prediction with machine learning for large molecular databases and pave the way for high throughput screening of materials to find new materials with advanced functionality.