Effect of Linear Prediction Order to Modify Formant Locations for Children Speech Recognition

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorKumar, Udara Laxmanen_US
dc.contributor.authorKurimo, Mikkoen_US
dc.contributor.authorKathania, Hemant Kumaren_US
dc.contributor.departmentDepartment of Information and Communications Engineeringen
dc.contributor.departmentDept Signal Process and Acousten
dc.contributor.editorKarpov, Alexeyen_US
dc.contributor.editorSamudravijaya, K.en_US
dc.contributor.editorDeepak, K. T.en_US
dc.contributor.editorHegde, Rajesh M.en_US
dc.contributor.editorPrasanna, S. R. Mahadevaen_US
dc.contributor.editorAgrawal, Shyam S.en_US
dc.contributor.groupauthorSpeech Recognitionen
dc.contributor.organizationNational Institute of Technology, Sikkimen_US
dc.date.accessioned2024-01-04T08:44:19Z
dc.date.available2024-01-04T08:44:19Z
dc.date.embargoinfo:eu-repo/date/embargoEnd/2024-11-22en_US
dc.date.issued2023en_US
dc.descriptionPublisher Copyright: © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
dc.description.abstractChildren’s speech recognition shows poor performance as compared to adult speech. Large amount of data is required for the neural network models to achieve good performance. A very limited amount of children’s speech data is publicly available. A baseline system was developed using adult speech for training and children’s speech for testing. This kind of system suffers from mismatches between training and testing speech data. To overcome one of the mismatches, which is formant frequency locations between adults and children, in this paper we have explored the effect of linear prediction order to modify the formant frequency locations. The explored method studies for narrowband and wideband speech and found that they gave reductions in word error rate (WER) for GMM-HMM, DNN-HMM, and TDNN acoustic models. The TDNN acoustic model gives the best performance as compared to other acoustic models. The best formant modification factor α is 0.1 for linear prediction order 6 for narrowband speech (WER 13.82%), and α is 0.1 for linear prediction order 20 for wideband speech (WER 12.19%) for the TDNN acoustic model. Further, we have also compared the method with vocal tract length normalization (VTLN) and speaking rate adaptation (SRA), and it is found that the proposed method gives a better reduction in WERs as compared to VTLN and SRA.en
dc.description.versionPeer revieweden
dc.format.extent11
dc.format.extent483-493
dc.identifier.citationKumar, U L, Kurimo, M & Kathania, H K 2023, Effect of Linear Prediction Order to Modify Formant Locations for Children Speech Recognition . in A Karpov, K Samudravijaya, K T Deepak, R M Hegde, S R M Prasanna & S S Agrawal (eds), Speech and Computer - 25th International Conference, SPECOM 2023, Proceedings . Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 14338 LNAI, Springer, pp. 483-493, International Conference on Speech and Computer, Dharwad, India, 29/11/2023 . https://doi.org/10.1007/978-3-031-48309-7_39en
dc.identifier.doi10.1007/978-3-031-48309-7_39en_US
dc.identifier.isbn9783031483080
dc.identifier.issn0302-9743
dc.identifier.issn1611-3349
dc.identifier.otherPURE UUID: 19d6e788-690a-4b60-a748-7240a983630den_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/19d6e788-690a-4b60-a748-7240a983630den_US
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85178516121&partnerID=8YFLogxKen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/125361
dc.identifier.urnURN:NBN:fi:aalto-202401041050
dc.language.isoenen
dc.relation.ispartofInternational Conference on Speech and Computeren
dc.relation.ispartofseriesSpeech and Computer - 25th International Conference, SPECOM 2023, Proceedingsen
dc.relation.ispartofseriesLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)en
dc.relation.ispartofseriesVolume 14338 LNAIen
dc.rightsembargoedAccessen
dc.subject.keywordChildren’s speech recognitionen_US
dc.subject.keywordFormant modificationen_US
dc.subject.keywordLinear predictionen_US
dc.subject.keywordTDNNen_US
dc.titleEffect of Linear Prediction Order to Modify Formant Locations for Children Speech Recognitionen
dc.typeConference article in proceedingsfi
Files