Extracting Medical Entities from Radiology Reports with Ontology-based Distant Supervision

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Perustieteiden korkeakoulu | Master's thesis

Authors

Department

Mcode

SCI3044

Language

en

Pages

53+7

Series

Abstract

Doctors need to review a substantial amount of medical documents, such as radiology reports, to make medical decisions. Named Entity Recognition~(NER) structuralizes the raw medical text by detecting and classifying medical-related entities. Structuralized documents with medical concepts improve the doctors' work effectiveness and contain medical information benefitting the extraction of important information. Nevertheless, deploying the NER on Finnish medical text is still challenging because of data annotation, in-domain adaptation, label in-completion problem, and label noise. To solve these problems, we develop a NER system called Auto-labeling and Noise-suppressed Network~(ANT). Automated annotation mechanism provides supervised signals for training samples of the NER dataset. Domain continual pretraining transfers in-domain knowledge to the NER model for better model performance. We leverage weak label completion scheme to complete weak labels generated by the automated annotation mechanism. Some noise suppression approaches are applied to further reduce the label noise. Experimental results show that our model has achieved relatively strong performance on a silver standard dataset. We also conduct ablation experiments to explore the effectiveness of our framework's components.

Description

Supervisor

Pekka, Marttinen

Thesis advisor

Miika, Koskinen
Shaoxiong, Ji

Other note

Citation