Multi-modal Chest X-Ray analysis: classification and report generation using self-supervised learning
Loading...
URL
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu |
Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
2022-07-29
Department
Major/Subject
Machine Learning, Data Science and Artificial Intelligence
Mcode
SCI3044
Degree programme
Master’s Programme in Computer, Communication and Information Sciences
Language
en
Pages
64
Series
Abstract
Automated medical systems for classification, localization and diagnosis are increasingly being researched and developed. Accurate and automated disease detection is beneficial both to medical personnel, who do not have to perform tedious examinations and to patients, for whom accurate prediction could save their lives. In this work, the models involved in classification and report generation from chest X-rays are studied. Due to the widespread use of the latter, we were able to collect several datasets, which allowed us to employ the self-supervised learning paradigm. This paradigm allows the methods to learn more representative and inherent internal representations for the domain in question. Two different models are used in this project, one for classification and the other for language modelling. The former is pretrained with the BarlowTwins framework, which is fed two modified copies of the same example, and a custom loss function allows learning of internal weights invariant to the applied transformations. The possible improvements that this approach brings are verified by performing a classification task on a reference dataset and compared with the same model which has not been pretrained with the proposed method. Regarding the language model, a pretraining step was performed at the character level on a large text corpus that includes a collection of medical reports. The fine-tuning process is the culmination of this project and involves the merging of the two models, with the former providing meaningful embeddings and the latter transforming these inputs into natural language. We were able to verify that pre-training with BarlowTwins, brings improvements in classification performance, and by pretraining the language model, one is able to generate text with appropriate grammatical and semantic correctness. However, fine-tuning did not bring satisfactory results, making this a starting point for future studies.Description
Supervisor
Marttinen, PekkaThesis advisor
Kumar, YogeshKeywords
machine learning, self-supervised learning, convolutional neural network (CNN), generative pretrained transformer (GPT), chest x-rays, medical Imaging