Deep Bayesian Experimental Design for Drug Discovery
Loading...
Access rights
openAccess
publishedVersion
URL
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Date
Department
Major/Subject
Mcode
Degree programme
Language
en
Pages
11
Series
AI in Drug Discovery - 1st International Workshop, AIDD 2024, Held in Conjunction with ICANN 2024, Proceedings, pp. 149-159, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) ; Volume 14894 LNCS
Abstract
In drug discovery, prioritizing compounds for testing is an important task. Active learning can assist in this endeavor by prioritizing molecules for label acquisition based on their estimated potential to enhance in-silico models. However, in specialized cases like toxicity modeling, limited dataset sizes can hinder effective training of modern neural networks for representation learning and to perform active learning. In this study, we leverage a transformer-based BERT model pretrained on millions of SMILES to perform active learning. Additionally, we explore different acquisition functions to assess their compatibility with pretrained BERT model. Our results demonstrate that pretrained models enhance active learning outcomes. Furthermore, we observe that active learning selects a higher proportion of positive compounds compared to random acquisition functions, an important advantage, especially in dealing with imbalanced toxicity datasets. Through a comparative analysis, we find that both BALD and EPIG acquisition functions outperform random acquisition, with EPIG exhibiting slightly superior performance over BALD. In summary, our study highlights the effectiveness of active learning in conjunction with pretrained models to tackle the problem of data scarcity.Description
Publisher Copyright: © The Author(s) 2025. | openaire: EC/H2020/956832/EU//AIDD
Keywords
Other note
Citation
Masood, M A, Cui, T & Kaski, S 2024, Deep Bayesian Experimental Design for Drug Discovery. in D-A Clevert, M Wand, J Schmidhuber, K Malinovská & I V Tetko (eds), AI in Drug Discovery - 1st International Workshop, AIDD 2024, Held in Conjunction with ICANN 2024, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 14894 LNCS, Springer, pp. 149-159, International Workshop on AI in Drug Discovery, Lugano, Switzerland, 19/09/2024. https://doi.org/10.1007/978-3-031-72381-0_12