Balancing Imbalanced Toxicity Models : Using MolBERT with Focal Loss

Loading...
Thumbnail Image

Access rights

openAccess
publishedVersion

URL

Journal Title

Journal ISSN

Volume Title

A4 Artikkeli konferenssijulkaisussa

Date

2025

Major/Subject

Mcode

Degree programme

Language

en

Pages

16

Series

AI in Drug Discovery - 1st International Workshop, AIDD 2024, Held in Conjunction with ICANN 2024, Proceedings, pp. 82-97, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) ; Volume 14894 LNCS

Abstract

Drug-induced liver injury (DILI) presents a multifaceted challenge, influenced by interconnected biological mechanisms. Current DILI datasets are characterized by small sizes and high imbalance, posing difficulties in learning robust representations and accurate modeling. To address these challenges, we trained a multi-modal multi-task model integrating preclinical histopathologies, biochemistry (blood markers), and clinical DILI-related adverse drug reactions (ADRs). Leveraging pretrained BERT models, we extracted representations covering a broad chemical space, facilitating robust learning in both frozen and fine-tuned settings. To address imbalanced data, we explored weighted Binary Cross-Entropy (w-BCE) and weighted Focal Loss (w-FL) . Our results demonstrate that the frozen BERT model consistently enhances performance across all metrics and modalities with weighted loss functions compared to their non-weighted counterparts. However, the efficacy of fine-tuning BERT varies across modalities, yielding inconclusive results. In summary, the incorporation of BERT features with weighted loss functions demonstrates advantages, while the efficacy of fine-tuning remains uncertain.

Description

Publisher Copyright: © The Author(s) 2025. | openaire: EC/H2020/956832/EU//AIDD

Keywords

BERT, DILI, Focal loss, Toxicity

Other note

Citation

Masood, M A, Kaski, S, Ceulemans, H, Herman, D & Heinonen, M 2025, Balancing Imbalanced Toxicity Models : Using MolBERT with Focal Loss . in D-A Clevert, M Wand, J Schmidhuber, K Malinovská & I V Tetko (eds), AI in Drug Discovery - 1st International Workshop, AIDD 2024, Held in Conjunction with ICANN 2024, Proceedings . Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 14894 LNCS, Springer, pp. 82-97, International Workshop on AI in Drug Discovery, Lugano, Switzerland, 19/09/2024 . https://doi.org/10.1007/978-3-031-72381-0_8