Effect of Linear Prediction Order to Modify Formant Locations for Children Speech Recognition

Loading...
Thumbnail Image

Access rights

openAccess
acceptedVersion

URL

Journal Title

Journal ISSN

Volume Title

A4 Artikkeli konferenssijulkaisussa

Date

Major/Subject

Mcode

Degree programme

Language

en

Pages

11

Series

Speech and Computer - 25th International Conference, SPECOM 2023, Proceedings, pp. 483-493, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) ; Volume 14338 LNAI

Abstract

Children’s speech recognition shows poor performance as compared to adult speech. Large amount of data is required for the neural network models to achieve good performance. A very limited amount of children’s speech data is publicly available. A baseline system was developed using adult speech for training and children’s speech for testing. This kind of system suffers from mismatches between training and testing speech data. To overcome one of the mismatches, which is formant frequency locations between adults and children, in this paper we have explored the effect of linear prediction order to modify the formant frequency locations. The explored method studies for narrowband and wideband speech and found that they gave reductions in word error rate (WER) for GMM-HMM, DNN-HMM, and TDNN acoustic models. The TDNN acoustic model gives the best performance as compared to other acoustic models. The best formant modification factor α is 0.1 for linear prediction order 6 for narrowband speech (WER 13.82%), and α is 0.1 for linear prediction order 20 for wideband speech (WER 12.19%) for the TDNN acoustic model. Further, we have also compared the method with vocal tract length normalization (VTLN) and speaking rate adaptation (SRA), and it is found that the proposed method gives a better reduction in WERs as compared to VTLN and SRA.

Description

Publisher Copyright: © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

Other note

Citation

Kumar, U L, Kurimo, M & Kathania, H K 2023, Effect of Linear Prediction Order to Modify Formant Locations for Children Speech Recognition. in A Karpov, K Samudravijaya, K T Deepak, R M Hegde, S R M Prasanna & S S Agrawal (eds), Speech and Computer - 25th International Conference, SPECOM 2023, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 14338 LNAI, Springer, pp. 483-493, International Conference on Speech and Computer, Dharwad, India, 29/11/2023. https://doi.org/10.1007/978-3-031-48309-7_39