Using Natural Language Processing to Identify Stigmatizing Language in Labor and Birth Clinical Notes

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorBarcelona, Veronica
dc.contributor.authorScharp, Danielle
dc.contributor.authorMoen, Hans
dc.contributor.authorDavoudi, Anahita
dc.contributor.authorIdnay, Betina R.
dc.contributor.authorCato, Kenrick
dc.contributor.authorTopaz, Maxim
dc.contributor.departmentDepartment of Computer Scienceen
dc.contributor.groupauthorProfessorship Marttinen P.en
dc.contributor.groupauthorProfessorship Kaski Samuelen
dc.contributor.organizationColumbia University
dc.contributor.organizationVNS Health
dc.date.accessioned2025-02-28T14:53:32Z
dc.date.available2025-02-28T14:53:32Z
dc.date.embargoinfo:eu-repo/date/embargoEnd/2024-12-26
dc.date.issued2024-03
dc.descriptionFunding Information: This project was supported by funding from the Columbia University Data Science Institute Seeds Funds Program and a grant (GBMF9048) from the Gordon and Betty Moore Foundation. Publisher Copyright: © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
dc.description.abstractIntroduction: Stigma and bias related to race and other minoritized statuses may underlie disparities in pregnancy and birth outcomes. One emerging method to identify bias is the study of stigmatizing language in the electronic health record. The objective of our study was to develop automated natural language processing (NLP) methods to identify two types of stigmatizing language: marginalizing language and its complement, power/privilege language, accurately and automatically in labor and birth notes. Methods: We analyzed notes for all birthing people > 20 weeks’ gestation admitted for labor and birth at two hospitals during 2017. We then employed text preprocessing techniques, specifically using TF-IDF values as inputs, and tested machine learning classification algorithms to identify stigmatizing and power/privilege language in clinical notes. The algorithms assessed included Decision Trees, Random Forest, and Support Vector Machines. Additionally, we applied a feature importance evaluation method (InfoGain) to discern words that are highly correlated with these language categories. Results: For marginalizing language, Decision Trees yielded the best classification with an F-score of 0.73. For power/privilege language, Support Vector Machines performed optimally, achieving an F-score of 0.91. These results demonstrate the effectiveness of the selected machine learning methods in classifying language categories in clinical notes. Conclusion: We identified well-performing machine learning methods to automatically detect stigmatizing language in clinical notes. To our knowledge, this is the first study to use NLP performance metrics to evaluate the performance of machine learning methods in discerning stigmatizing language. Future studies should delve deeper into refining and evaluating NLP methods, incorporating the latest algorithms rooted in deep learning.en
dc.description.versionPeer revieweden
dc.format.mimetypeapplication/pdf
dc.identifier.citationBarcelona, V, Scharp, D, Moen, H, Davoudi, A, Idnay, B R, Cato, K & Topaz, M 2024, 'Using Natural Language Processing to Identify Stigmatizing Language in Labor and Birth Clinical Notes', Maternal and Child Health Journal, vol. 28, no. 3, pp. 578–586. https://doi.org/10.1007/s10995-023-03857-4en
dc.identifier.doi10.1007/s10995-023-03857-4
dc.identifier.issn1092-7875
dc.identifier.issn1573-6628
dc.identifier.otherPURE UUID: 463ac720-9676-479d-b7fd-55bab79c6f71
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/463ac720-9676-479d-b7fd-55bab79c6f71
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/133715642/Using_Natural_Language_Processing_to_Identify_Stigmatizing_Language_in_Labor_and_Birth_Clinical_Notes.pdf
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/134355
dc.identifier.urnURN:NBN:fi:aalto-202502282615
dc.language.isoenen
dc.publisherSpringer
dc.relation.fundinginfoThis project was supported by funding from the Columbia University Data Science Institute Seeds Funds Program and a grant (GBMF9048) from the Gordon and Betty Moore Foundation.
dc.relation.ispartofseriesMaternal and Child Health Journalen
dc.relation.ispartofseriesVolume 28, issue 3, pp. 578–586en
dc.rightsopenAccessen
dc.subject.keywordBias
dc.subject.keywordElectronic health records
dc.subject.keywordNatural language processing
dc.titleUsing Natural Language Processing to Identify Stigmatizing Language in Labor and Birth Clinical Notesen
dc.typeA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessäfi
dc.type.versionacceptedVersion

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Using_Natural_Language_Processing_to_Identify_Stigmatizing_Language_in_Labor_and_Birth_Clinical_Notes.pdf
Size:
865.72 KB
Format:
Adobe Portable Document Format