MIRA : A Novel Framework for Fusing Modalities in Medical RAG
Loading...
Access rights
openAccess
CC BY
CC BY
publishedVersion
URL
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Date
Department
Major/Subject
Mcode
Degree programme
Language
en
Pages
9
Series
MM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025, pp. 6307-6315
Abstract
Multimodal Large Language Models (MLLM) have significantly advanced AI-assisted medical diagnosis, but often generate factually inconsistent responses that deviate from established medical knowledge. Retrieval-Augmented Generation (RAG) enhances factual accuracy by integrating external sources, but it presents two key challenges. First, insufficient retrieval can miss critical information, whereas excessive retrieval can introduce irrelevant or misleading content, disrupting model output. Second, even when the model initially provides correct answers, over-reliance on retrieved data can lead to factual errors. To address these issues, we introduce Multimodal Intelligent Retrieval and Augmentation (MIRA) framework, designed to optimize factual accuracy in MLLM. MIRA consists of two key components: (1) a calibrated Rethinking and Rearrangement module that dynamically adjusts the number of retrieved contexts to manage factual risk, and (2) A medical RAG framework integrating image embeddings and a medical knowledge base with a query-rewrite module for efficient multimodal reasoning. This enables the model to effectively integrate both its inherent knowledge and external references. Our evaluation of publicly available medical VQA and report generation benchmarks demonstrates that MIRA substantially enhances factual accuracy and overall performance, achieving new state-of-the-art results. Code is released at https://github.com/mbzuai-oryx/MIRA.Description
Publisher Copyright: © 2025 Copyright held by the owner/author(s).
Other note
Citation
Wang, J, Ashraf, T, Han, Z, Laaksonen, J & Anwer, R M 2025, MIRA : A Novel Framework for Fusing Modalities in Medical RAG. in MM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025. ACM, pp. 6307-6315, ACM International Conference on Multimedia, Dublin, Ireland, 27/10/2025. https://doi.org/10.1145/3746027.3755760