An interpretable molecular descriptor for machine learning predictions in atmospheric science

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorLind, L.
dc.contributor.authorSandström, H.
dc.contributor.authorRinke, P.
dc.contributor.departmentDepartment of Applied Physicsen
dc.contributor.groupauthorComputational Electronic Structure Theoryen
dc.contributor.organizationComputational Electronic Structure Theory
dc.date.accessioned2026-03-18T09:02:21Z
dc.date.available2026-03-18T09:02:21Z
dc.date.issued2026-02-28
dc.description| openaire: EC/HE/101203938/EU//CLOUDMAP
dc.description.abstractThe study of aerosol formation and chemistry using machine learning is limited by the lack of molecular descriptors suited to atmospheric compounds. Interpretable models are particularly affected because they often rely on dictionary-based descriptors tied to specific molecular substructures, which currently fail to capture the full range of organic atmospheric compounds, including large, highly oxidized molecules common in the atmosphere. We introduce ATMOMACCS, an interpretable descriptor combining the 166 binary keys of the MACCS fingerprint with motifs inspired by the SIMPOL method for estimating saturation vapor pressures. We show that ATMOMACCS outperforms the RDKit topological fingerprint in kernel ridge regression models, improving predictions of saturation vapor pressures (7%, 8%, 29%, and 43% error reduction), equilibrium partition coefficients (5% and 9% error reduction), glass transition temperatures (22% error reduction), and enthalpies of vaporization (61% error reduction) on six datasets with atmospheric compounds. Feature analysis shows that saturation vapor pressure and partition coefficients are governed by carbon number and oxygen-related features, whereas other phase-transition properties (e.g., enthalpy of vaporization and glass transition temperature) depend on carbon–hydrogen bond types and the presence of heteroatoms other than oxygen. This highlights the generalizability of ATMOMACCS across different datasets and properties as an interpretable molecular descriptor.en
dc.description.versionPeer revieweden
dc.format.extent19
dc.format.mimetypeapplication/pdf
dc.identifier.citationLind, L, Sandström, H & Rinke, P 2026, 'An interpretable molecular descriptor for machine learning predictions in atmospheric science', Journal of Chemical Physics, vol. 164, no. 8, 084115, pp. 1-19. https://doi.org/10.1063/5.0308548en
dc.identifier.doi10.1063/5.0308548
dc.identifier.issn0021-9606
dc.identifier.issn1089-7690
dc.identifier.otherPURE UUID: 09b91820-7c9e-4219-9489-8e54630c77a6
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/09b91820-7c9e-4219-9489-8e54630c77a6
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/213705654/An_interpretable_molecular_descriptor_for_machine_learning_predictions_in_atmospheric_science.pdf
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/143521
dc.identifier.urnURN:NBN:fi:aalto-202603182863
dc.language.isoenen
dc.publisherAmerican Institute of Physics
dc.relationinfo:eu-repo/grantAgreement/EC/HE/101203938/EU//CLOUDMAP
dc.relation.fundinginfoThis study was supported by the Research Council of Finland through Project No. 346377, the EU COST Actions Grant Nos. CA18234 and CA22154, and the European Commission through the Marie Skłodowska-Curie Actions (MSCA) under Grant Agreement No. 101203938. We further acknowledge CSC-IT Center for Science, Finland, and the Aalto Science-IT project. The authors acknowledge Theo Kurtén for insightful discussions.
dc.relation.ispartofseriesJournal of Chemical Physicsen
dc.relation.ispartofseriesVolume 164, issue 8, pp. 1-19en
dc.rightsopenAccessen
dc.rightsCC BY
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.titleAn interpretable molecular descriptor for machine learning predictions in atmospheric scienceen
dc.typeA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessäfi
dc.type.versionpublishedVersion

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
An_interpretable_molecular_descriptor_for_machine_learning_predictions_in_atmospheric_science.pdf
Size:
8.88 MB
Format:
Adobe Portable Document Format