Effect of Speech Modification on Wav2Vec2 Models for Children Speech Recognition
Loading...
Access rights
openAccess
acceptedVersion
URL
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
Date
Major/Subject
Mcode
Degree programme
Language
en
Pages
Series
2024 International Conference on Signal Processing and Communications, SPCOM 2024, International Conference on Signal Processing and Communications
Abstract
Speech modification methods normalize children's speech towards adults' speech, enabling off-the-shelf generic automatic speech recognition (ASR) for this low-resource scenario. On the other hand, ASR models like Wav2Vec2 have shown remarkable robustness towards various speakers, thus streamlining their deployment. This paper examines the benefit of speech modification methods when using Wav2Vec2 models on children's speech. We experimented with prototypical speech modification methods and found that while models trained on large datasets exhibit similar performance across unmodified and modified children's speech, models trained on smaller datasets exhibit notably enhanced performance with modified speech. However, analyzing age effects on PF-Star and CMU Kids evaluation sets, we observe that all Wav2Vec2 variants still underperform for children under 10 years. In this scenario, speech modification methods and their combinations help improve performance for small and large Wav2Vec2 models but have plenty of room for improvement.Description
Publisher Copyright: © 2024 IEEE.
Other note
Citation
Sinha, A, Singh, M, Kadiri, S R, Kurimo, M & Kathania, H K 2024, Effect of Speech Modification on Wav2Vec2 Models for Children Speech Recognition. in 2024 International Conference on Signal Processing and Communications, SPCOM 2024. International Conference on Signal Processing and Communications, IEEE, International Conference on Signal Processing and Communications, Bangalore, India, 01/07/2024. https://doi.org/10.1109/SPCOM60851.2024.10631626