Exploring the Impact of Fine-Tuning the Wav2vec2 Model in Database-Independent Detection of Dysarthric Speech

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorJavanmardi, Farhaden_US
dc.contributor.authorKadiri, Sudarsanaen_US
dc.contributor.authorAlku, Paavoen_US
dc.contributor.departmentDepartment of Information and Communications Engineeringen
dc.contributor.groupauthorSpeech Communication Technologyen
dc.date.accessioned2024-08-28T08:48:43Z
dc.date.available2024-08-28T08:48:43Z
dc.date.issued2024en_US
dc.description.abstractMany acoustic features and machine learning models have been studied to build automatic detection systems to distinguish dysarthric speech from healthy speech. These systems can help to improve the reliability of diagnosis. However, speech recorded for diagnosis in real-life clinical conditions can differ from the training data of the detection system in terms of, for example, recording conditions, speaker identity, and language. These mismatches may lead to a reduction in detection performance in practical applications. In this study, we investigate the use of the wav2vec2 model as a feature extractor together with a support vector machine (SVM) classifier to build automatic detection systems for dysarthric speech. The performance of the wav2vec2 features is evaluated in two cross-database scenarios, language-dependent and language-independent, to study their generalizability to unseen speakers, recording conditions, and languages before and after fine-tuning the wav2vec2 model. The results revealed that the fine-tuned wav2vec2 features showed better generalization in both scenarios and gave an absolute accuracy improvement of 1.46% – 8.65% compared to the non-fine-tuned wav2vec2 features.en
dc.description.versionPeer revieweden
dc.format.extent12
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationJavanmardi, F, Kadiri, S & Alku, P 2024, ' Exploring the Impact of Fine-Tuning the Wav2vec2 Model in Database-Independent Detection of Dysarthric Speech ', IEEE Journal of Biomedical and Health Informatics, vol. 28, no. 8, pp. 4951-4962 . https://doi.org/10.1109/JBHI.2024.3392829en
dc.identifier.doi10.1109/JBHI.2024.3392829en_US
dc.identifier.issn2168-2194
dc.identifier.issn2168-2208
dc.identifier.otherPURE UUID: aa63ef3e-cd31-434f-8de4-f6346fa67e8ben_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/aa63ef3e-cd31-434f-8de4-f6346fa67e8ben_US
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85192167956&partnerID=8YFLogxKen_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/154649314/Exploring_the_Impact_of_Fine-Tuning_the_Wav2vec2_Model_in_Database-Independent_Detection_of_Dysarthric_Speech.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/130415
dc.identifier.urnURN:NBN:fi:aalto-202408285976
dc.language.isoenen
dc.publisherIEEE
dc.relation.ispartofseriesIEEE Journal of Biomedical and Health Informatics
dc.relation.ispartofseriesVolume 28, issue 8, pp. 4951-4962
dc.rightsopenAccessen
dc.subject.keywordDysarthriaen_US
dc.subject.keywordFine-tuningen_US
dc.subject.keywordSelf-supervised learningen_US
dc.subject.keywordWav2vec 2.0en_US
dc.subject.keywordfine-tuningen_US
dc.subject.keywordwav2vec 2.0en_US
dc.subject.keywordself-supervised learningen_US
dc.titleExploring the Impact of Fine-Tuning the Wav2vec2 Model in Database-Independent Detection of Dysarthric Speechen
dc.typeA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessäfi
dc.type.versionpublishedVersion

Files