Human-in-the-loop active learning for goal-oriented molecule generation
No Thumbnail Available
Access rights
openAccess
CC BY
CC BY
publishedVersion
URL
Journal Title
Journal ISSN
Volume Title
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
Date
2024-12-09
Department
Major/Subject
Mcode
Degree programme
Language
en
Pages
24
Series
Journal of Cheminformatics, Volume 16, issue 1
Abstract
Machine learning (ML) systems have enabled the modelling of quantitative structure–property relationships (QSPR) and structure-activity relationships (QSAR) using existing experimental data to predict target properties for new molecules. These property predictors hold significant potential in accelerating drug discovery by guiding generative artificial intelligence (AI) agents to explore desired chemical spaces. However, they often struggle to generalize due to the limited scope of the training data. When optimized by generative agents, this limitation can result in the generation of molecules with artificially high predicted probabilities of satisfying target properties, which subsequently fail experimental validation. To address this challenge, we propose an adaptive approach that integrates active learning (AL) and iterative feedback to refine property predictors, thereby improving the outcomes of their optimization by generative AI agents. Our method leverages the Expected Predictive Information Gain (EPIG) criterion to select additional molecules for evaluation by an oracle. This process aims to provide the greatest reduction in predictive uncertainty, enabling more accurate model evaluations of subsequently generated molecules. Recognizing the impracticality of immediate wet-lab or physics-based experiments due to time and logistical constraints, we propose leveraging human experts for their cost-effectiveness and domain knowledge to effectively augment property predictors, bridging gaps in the limited training data. Empirical evaluations through both simulated and real humanin-the-loop experiments demonstrate that our approach refines property predictors to better align with oracle assessments. Additionally, we observe improved accuracy of predicted properties as well as improved drug-likeness among the top-ranking generated molecules.Description
| openaire: EC/H2020/956832/EU//AIDD
Keywords
Active learning, Goal-oriented molecule generation, Human-in-the-loop, Interactive algorithms, Machine learning
Other note
Citation
Nahal, Y, Menke, J, Martinelli, J, Heinonen, M, Kabeshov, M, Janet, J P, Nittinger, E, Engkvist, O & Kaski, S 2024, ' Human-in-the-loop active learning for goal-oriented molecule generation ', Journal of Cheminformatics, vol. 16, no. 1, 138 . https://doi.org/10.1186/s13321-024-00924-y