aalto1 untyped-item.component.html

Weak Supervision and Clustering-Based Sample Selection for Clinical Named Entity Recognition

Loading...
Thumbnail Image

Access rights

openAccess
publishedVersion

URL

Journal Title

Journal ISSN

Volume Title

A4 Artikkeli konferenssijulkaisussa

Date

Major/Subject

Mcode

Degree programme

Language

en

Pages

16

Series

Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track - European Conference, ECML PKDD 2023, Proceedings, pp. 444-459, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) ; Volume 14174 LNAI

Abstract

One of the central tasks of medical text analysis is to extract and structure meaningful information from plain-text clinical documents. Named Entity Recognition (NER) is a sub-task of information extraction that involves identifying predefined entities from unstructured free text. Notably, NER models require large amounts of human-labeled data to train, but human annotation is costly and laborious and often requires medical training. Here, we aim to overcome the shortage of manually annotated data by introducing a training scheme for NER models that uses an existing medical ontology to assign weak labels to entities and provides enhanced domain-specific model adaptation with in-domain continual pretraining. Due to limited human annotation resources, we develop a specific module to collect a more representative test dataset from the data lake than a random selection. To validate our framework, we invite clinicians to annotate the test set. In this way, we construct two Finnish medical NER datasets based on clinical records retrieved from a hospital’s data lake and evaluate the effectiveness of the proposed methods. The code is available at https://github.com/VRCMF/HAM-net.git.

Description

Funding Information: This work was supported by the Academy of Finland (Flagship programme: Finnish Center for Artificial Intelligence FCAI, and grants 336033, 352986) and EU (H2020 grant 101016775 and NextGenerationEU). We wish to acknowledge HUS Acamedic for providing secure computing resources. We also acknowledge the computational resources provided by the Aalto Science-IT project and CSC-IT Center for Science, Finland for prototyping our methods on synthetic data. Publisher Copyright: © 2023, The Author(s). | openaire: EC/H2020/101016775/EU//INTERVENE

Other note

Citation

Sun, W, Ji, S, Denti, T, Moen, H, Kerro, O, Rannikko, A, Marttinen, P & Koskinen, M 2023, Weak Supervision and Clustering-Based Sample Selection for Clinical Named Entity Recognition. in G De Francisci Morales, F Bonchi, C Perlich, N Ruchansky, N Kourtellis & E Baralis (eds), Machine Learning and Knowledge Discovery in Databases : Applied Data Science and Demo Track - European Conference, ECML PKDD 2023, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 14174 LNAI, Springer, pp. 444-459, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Turin, Italy, 18/09/2023. https://doi.org/10.1007/978-3-031-43427-3_27

Endorsement

Review

Supplemented By

Referenced By