Spectral warping based data augmentation for low resource children’s speaker verification

Loading...
Thumbnail Image

Access rights

openAccess
publishedVersion

URL

Journal Title

Journal ISSN

Volume Title

A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä

Major/Subject

Mcode

Degree programme

Language

en

Pages

12

Series

Multimedia Tools and Applications, Volume 83, issue 16, pp. 48895-48906

Abstract

In this paper, we present our effort to develop an automatic speaker verification (ASV) system for low resources children’s data. For the children’s speakers, very limited amount of speech data is available in majority of the languages for training the ASV system. Developing an ASV system under low resource conditions is a very challenging problem. To develop the robust baseline system, we merged out of domain adults’ data with children’s data to train the ASV system and tested with children’s speech. This kind of system leads to acoustic mismatches between training and testing data. To overcome this issue, we have proposed spectral warping based data augmentation. We modified adult speech data using spectral warping method (to simulate like children’s speech) and added it to the training data to overcome data scarcity and mismatch between adults’ and children’s speech. The proposed data augmentation gives 20.46% and 52.52% relative improvement (in equal error rate) for Indian Punjabi and British English speech databases, respectively. We compared our proposed method with well known data augmentation methods: SpecAugment, speed perturbation (SP) and vocal tract length perturbation (VTLP), and found that the proposed method performed best. The proposed spectral warping method is publicly available at https://github.com/kathania/Speaker-Verification-spectral-warping .

Description

Funding Information: This work was supported by the Academy of Finland (grants 329267, 330139). Publisher Copyright: © 2023, The Author(s).

Other note

Citation

Kathania, H K, Kadyan, V, Kadiri, S R & Kurimo, M 2024, 'Spectral warping based data augmentation for low resource children’s speaker verification', Multimedia Tools and Applications, vol. 83, no. 16, pp. 48895-48906. https://doi.org/10.1007/s11042-023-17263-z