Spectral warping based data augmentation for low resource children’s speaker verification

A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä
Multimedia Tools and Applications
In this paper, we present our effort to develop an automatic speaker verification (ASV) system for low resources children’s data. For the children’s speakers, very limited amount of speech data is available in majority of the languages for training the ASV system. Developing an ASV system under low resource conditions is a very challenging problem. To develop the robust baseline system, we merged out of domain adults’ data with children’s data to train the ASV system and tested with children’s speech. This kind of system leads to acoustic mismatches between training and testing data. To overcome this issue, we have proposed spectral warping based data augmentation. We modified adult speech data using spectral warping method (to simulate like children’s speech) and added it to the training data to overcome data scarcity and mismatch between adults’ and children’s speech. The proposed data augmentation gives 20.46% and 52.52% relative improvement (in equal error rate) for Indian Punjabi and British English speech databases, respectively. We compared our proposed method with well known data augmentation methods: SpecAugment, speed perturbation (SP) and vocal tract length perturbation (VTLP), and found that the proposed method performed best. The proposed spectral warping method is publicly available at https://github.com/kathania/Speaker-Verification-spectral-warping .
Funding Information: This work was supported by the Academy of Finland (grants 329267, 330139). Publisher Copyright: © 2023, The Author(s).
Children’s speech, Low resource languages, Speaker verification, Spectral warping, Speed perturbation, Vocal tract length perturbation
Kathania , H K , Kadyan , V , Kadiri , S R & Kurimo , M 2023 , ' Spectral warping based data augmentation for low resource children’s speaker verification ' , Multimedia Tools and Applications . https://doi.org/10.1007/s11042-023-17263-z