Multi-modal Chest X-Ray analysis: classification and report generation using self-supervised learning

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Perustieteiden korkeakoulu | Master's thesis

Date

2022-07-29

Department

Major/Subject

Machine Learning, Data Science and Artificial Intelligence

Mcode

SCI3044

Degree programme

Master’s Programme in Computer, Communication and Information Sciences

Language

en

Pages

64

Series

Abstract

Automated medical systems for classification, localization and diagnosis are increasingly being researched and developed. Accurate and automated disease detection is beneficial both to medical personnel, who do not have to perform tedious examinations and to patients, for whom accurate prediction could save their lives. In this work, the models involved in classification and report generation from chest X-rays are studied. Due to the widespread use of the latter, we were able to collect several datasets, which allowed us to employ the self-supervised learning paradigm. This paradigm allows the methods to learn more representative and inherent internal representations for the domain in question. Two different models are used in this project, one for classification and the other for language modelling. The former is pretrained with the BarlowTwins framework, which is fed two modified copies of the same example, and a custom loss function allows learning of internal weights invariant to the applied transformations. The possible improvements that this approach brings are verified by performing a classification task on a reference dataset and compared with the same model which has not been pretrained with the proposed method. Regarding the language model, a pretraining step was performed at the character level on a large text corpus that includes a collection of medical reports. The fine-tuning process is the culmination of this project and involves the merging of the two models, with the former providing meaningful embeddings and the latter transforming these inputs into natural language. We were able to verify that pre-training with BarlowTwins, brings improvements in classification performance, and by pretraining the language model, one is able to generate text with appropriate grammatical and semantic correctness. However, fine-tuning did not bring satisfactory results, making this a starting point for future studies.

Description

Supervisor

Marttinen, Pekka

Thesis advisor

Kumar, Yogesh

Keywords

machine learning, self-supervised learning, convolutional neural network (CNN), generative pretrained transformer (GPT), chest x-rays, medical Imaging

Other note

Citation