aalto1 untyped-item.component.html
Weak Supervision and Clustering-Based Sample Selection for Clinical Named Entity Recognition
Loading...
Access rights
openAccess
publishedVersion
URL
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Date
Department
Major/Subject
Mcode
Degree programme
Language
en
Pages
16
Series
Machine Learning and Knowledge Discovery in Databases: Applied Data Science and Demo Track - European Conference, ECML PKDD 2023, Proceedings, pp. 444-459, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) ; Volume 14174 LNAI
Abstract
One of the central tasks of medical text analysis is to extract and structure meaningful information from plain-text clinical documents. Named Entity Recognition (NER) is a sub-task of information extraction that involves identifying predefined entities from unstructured free text. Notably, NER models require large amounts of human-labeled data to train, but human annotation is costly and laborious and often requires medical training. Here, we aim to overcome the shortage of manually annotated data by introducing a training scheme for NER models that uses an existing medical ontology to assign weak labels to entities and provides enhanced domain-specific model adaptation with in-domain continual pretraining. Due to limited human annotation resources, we develop a specific module to collect a more representative test dataset from the data lake than a random selection. To validate our framework, we invite clinicians to annotate the test set. In this way, we construct two Finnish medical NER datasets based on clinical records retrieved from a hospital’s data lake and evaluate the effectiveness of the proposed methods. The code is available at https://github.com/VRCMF/HAM-net.git.
Description
Funding Information: This work was supported by the Academy of Finland (Flagship programme: Finnish Center for Artificial Intelligence FCAI, and grants 336033, 352986) and EU (H2020 grant 101016775 and NextGenerationEU). We wish to acknowledge HUS Acamedic for providing secure computing resources. We also acknowledge the computational resources provided by the Aalto Science-IT project and CSC-IT Center for Science, Finland for prototyping our methods on synthetic data. Publisher Copyright: © 2023, The Author(s). | openaire: EC/H2020/101016775/EU//INTERVENE
Other note
Citation
Sun, W, Ji, S, Denti, T, Moen, H, Kerro, O, Rannikko, A, Marttinen, P & Koskinen, M 2023, Weak Supervision and Clustering-Based Sample Selection for Clinical Named Entity Recognition. in G De Francisci Morales, F Bonchi, C Perlich, N Ruchansky, N Kourtellis & E Baralis (eds), Machine Learning and Knowledge Discovery in Databases : Applied Data Science and Demo Track - European Conference, ECML PKDD 2023, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 14174 LNAI, Springer, pp. 444-459, European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Turin, Italy, 18/09/2023. https://doi.org/10.1007/978-3-031-43427-3_27