Investigating Foundation Models in Medical Imaging

Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

School of Electrical Engineering | Master's thesis

Date

2024-11-17

Department

Major/Subject

Autonomous Systems

Mcode

Degree programme

Master's Programme in ICT Innovation

Language

en

Pages

1

Series

Abstract

This thesis studies the efficacy of foundation models in medical image segmentation, focusing on CT scans amongst many modalities of medical imaging for the Aortaseg24 and HaN-Seg datasets. We implemented and fine-tuned six foundation models and a baseline Vanilla U-Net, evaluating their performance, adaptability, and computational efficiency. From our experiments we observe that foundation models consistently outperform the Vanilla U-Net baseline. STU-Net emerged as the top performer, achieving Dice scores of 0.76 for Aortaseg24 and 0.70 for HaN-Seg, significantly surpassing the baseline’s 0.37 and 0.36 respectively. SwinMM followed closely, showing particularly strong performance on the more complex HaN-Seg task. Class- wise analysis (as Aortaseg24 and HaN-seg are multi-class segmentation datasets) revealed that foundation models excel in segmenting larger, more distinct anatomical structures, with STU-Net achieving Dice scores above 0.90 for numerous classes in Aortaseg24. However, all models struggled with smaller, more intricate structures, particularly in the HaN-Seg task. Computational efficiency varied widely among models. DAE processed the highest number of samples per second (2.99 ± 0.55), while SuPreM demonstrated the fastest convergence, requiring only 6000 iterations for Aortaseg24 and 4500 for HaN-Seg. Conversely, SPAD-Nets required the most iterations (30120 for Aortaseg24), highlighting significant differences in learning efficiency. The study also revealed task-specific performance variability. Models like SwinMM and MIS-FM showed better relative performance on HaN-Seg compared to Aortaseg24, highlighting that pre-training strategies and architectural designs influence a model’s suitability for specific anatomical regions. Our findings highlight the potential of foundation models to progress medical image segmentation while also highlighting challenges in balancing performance with computational efficiency. The observed task-specific variability emphasizes the need for careful model selection based on the target application. This comprehensive analysis provides crucial insights for future developments in medical image segmentation, emphasizing the need for robust, versatile architectures capable of handling diverse and complex anatomical regions.

Description

Supervisor

Zhou, Quan

Thesis advisor

Troubitsyna, Elena
Herman, Pawel

Keywords

medical image segmentation, foundation models, computed tomography(CT), deep learning, transfer learning, computational efficiency, HaN-Seg

Other note

Citation