Effect of Linear Prediction Order to Modify Formant Locations for Children Speech Recognition

No Thumbnail Available
Access rights
embargoedAccess
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal
Embargo ends: 2024-11-22

Other link related to publication
Date
2023
Major/Subject
Mcode
Degree programme
Language
en
Pages
11
483-493
Series
Speech and Computer - 25th International Conference, SPECOM 2023, Proceedings, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Volume 14338 LNAI
Abstract
Children’s speech recognition shows poor performance as compared to adult speech. Large amount of data is required for the neural network models to achieve good performance. A very limited amount of children’s speech data is publicly available. A baseline system was developed using adult speech for training and children’s speech for testing. This kind of system suffers from mismatches between training and testing speech data. To overcome one of the mismatches, which is formant frequency locations between adults and children, in this paper we have explored the effect of linear prediction order to modify the formant frequency locations. The explored method studies for narrowband and wideband speech and found that they gave reductions in word error rate (WER) for GMM-HMM, DNN-HMM, and TDNN acoustic models. The TDNN acoustic model gives the best performance as compared to other acoustic models. The best formant modification factor α is 0.1 for linear prediction order 6 for narrowband speech (WER 13.82%), and α is 0.1 for linear prediction order 20 for wideband speech (WER 12.19%) for the TDNN acoustic model. Further, we have also compared the method with vocal tract length normalization (VTLN) and speaking rate adaptation (SRA), and it is found that the proposed method gives a better reduction in WERs as compared to VTLN and SRA.
Description
Publisher Copyright: © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
Keywords
Children’s speech recognition, Formant modification, Linear prediction, TDNN
Other note
Citation
Kumar, U L, Kurimo, M & Kathania, H K 2023, Effect of Linear Prediction Order to Modify Formant Locations for Children Speech Recognition . in A Karpov, K Samudravijaya, K T Deepak, R M Hegde, S R M Prasanna & S S Agrawal (eds), Speech and Computer - 25th International Conference, SPECOM 2023, Proceedings . Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 14338 LNAI, Springer, pp. 483-493, International Conference on Speech and Computer, Dharwad, India, 29/11/2023 . https://doi.org/10.1007/978-3-031-48309-7_39