aalto1 untyped-item.component.html
An interpretable molecular descriptor for machine learning predictions in atmospheric science
Loading...
Access rights
openAccess
CC BY
CC BY
Creative Commons license
Except where otherwised noted, this item's license is described as openAccess
publishedVersion
URL
Journal Title
Journal ISSN
Volume Title
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
Department
Major/Subject
Mcode
Degree programme
Language
en
Pages
19
Series
Journal of Chemical Physics, Volume 164, issue 8, pp. 1-19
Abstract
The study of aerosol formation and chemistry using machine learning is limited by the lack of molecular descriptors suited to atmospheric compounds. Interpretable models are particularly affected because they often rely on dictionary-based descriptors tied to specific molecular substructures, which currently fail to capture the full range of organic atmospheric compounds, including large, highly oxidized molecules common in the atmosphere. We introduce ATMOMACCS, an interpretable descriptor combining the 166 binary keys of the MACCS fingerprint with motifs inspired by the SIMPOL method for estimating saturation vapor pressures. We show that ATMOMACCS outperforms the RDKit topological fingerprint in kernel ridge regression models, improving predictions of saturation vapor pressures (7%, 8%, 29%, and 43% error reduction), equilibrium partition coefficients (5% and 9% error reduction), glass transition temperatures (22% error reduction), and enthalpies of vaporization (61% error reduction) on six datasets with atmospheric compounds. Feature analysis shows that saturation vapor pressure and partition coefficients are governed by carbon number and oxygen-related features, whereas other phase-transition properties (e.g., enthalpy of vaporization and glass transition temperature) depend on carbon–hydrogen bond types and the presence of heteroatoms other than oxygen. This highlights the generalizability of ATMOMACCS across different datasets and properties as an interpretable molecular descriptor.
Description
| openaire: EC/HE/101203938/EU//CLOUDMAP
Keywords
Other note
Citation
Lind, L, Sandström, H & Rinke, P 2026, 'An interpretable molecular descriptor for machine learning predictions in atmospheric science', Journal of Chemical Physics, vol. 164, no. 8, 084115, pp. 1-19. https://doi.org/10.1063/5.0308548
