Effect of Speech Modification on Wav2Vec2 Models for Children Speech Recognition

Loading...
Thumbnail Image

Access rights

openAccess
acceptedVersion

URL

Journal Title

Journal ISSN

Volume Title

A4 Artikkeli konferenssijulkaisussa

Date

Major/Subject

Mcode

Degree programme

Language

en

Pages

Series

2024 International Conference on Signal Processing and Communications, SPCOM 2024, International Conference on Signal Processing and Communications

Abstract

Speech modification methods normalize children's speech towards adults' speech, enabling off-the-shelf generic automatic speech recognition (ASR) for this low-resource scenario. On the other hand, ASR models like Wav2Vec2 have shown remarkable robustness towards various speakers, thus streamlining their deployment. This paper examines the benefit of speech modification methods when using Wav2Vec2 models on children's speech. We experimented with prototypical speech modification methods and found that while models trained on large datasets exhibit similar performance across unmodified and modified children's speech, models trained on smaller datasets exhibit notably enhanced performance with modified speech. However, analyzing age effects on PF-Star and CMU Kids evaluation sets, we observe that all Wav2Vec2 variants still underperform for children under 10 years. In this scenario, speech modification methods and their combinations help improve performance for small and large Wav2Vec2 models but have plenty of room for improvement.

Description

Publisher Copyright: © 2024 IEEE.

Other note

Citation

Sinha, A, Singh, M, Kadiri, S R, Kurimo, M & Kathania, H K 2024, Effect of Speech Modification on Wav2Vec2 Models for Children Speech Recognition. in 2024 International Conference on Signal Processing and Communications, SPCOM 2024. International Conference on Signal Processing and Communications, IEEE, International Conference on Signal Processing and Communications, Bangalore, India, 01/07/2024. https://doi.org/10.1109/SPCOM60851.2024.10631626