Synthesis Speech Based Data Augmentation for Low Resource Children ASR

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorKadyan, Virenderen_US
dc.contributor.authorKathania, Hemanten_US
dc.contributor.authorGovil, Prajjvalen_US
dc.contributor.authorKurimo, Mikkoen_US
dc.contributor.departmentDepartment of Signal Processing and Acousticsen
dc.contributor.editorKarpov, Alexeyen_US
dc.contributor.editorPotapova, Rodmongaen_US
dc.contributor.groupauthorSpeech Recognitionen
dc.contributor.organizationUniversity of Petroleum and Energy Studiesen_US
dc.contributor.organizationDepartment of Signal Processing and Acousticsen_US
dc.date.accessioned2021-11-10T07:46:36Z
dc.date.available2021-11-10T07:46:36Z
dc.date.issued2021en_US
dc.descriptionPublisher Copyright: © 2021, Springer Nature Switzerland AG.
dc.description.abstractSuccessful speech recognition for children requires large training data with sufficient speaker variability. The collection of such a training database of children’s voices is challenging and very expensive for zero/low resource language like Punjabi. In this paper, the data scarcity issue of the low resourced language Punjabi is addressed through two levels of augmentation. The original training corpus is first augmented by modifying the prosody parameters for pitch and speaking rate. Our results show that the augmentation improves the system performance over the baseline system. Then the augmented data combined with original data and used to train the TTS system to generate synthesis data and extended dataset is further used for augmented by generating children’s utterances using text-to-speech synthesis and sampling the language model with methods that increase the acoustic and lexical diversity. The final speech recognition performance indicates a relative improvement of 50.10% with acoustic and 57.40% with language diversity based augmentation in comparison to that of the baseline system respectively.en
dc.description.versionPeer revieweden
dc.format.extent10
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationKadyan, V, Kathania, H, Govil, P & Kurimo, M 2021, Synthesis Speech Based Data Augmentation for Low Resource Children ASR . in A Karpov & R Potapova (eds), Speech and Computer - 23rd International Conference, SPECOM 2021, Proceedings . Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12997 LNAI, Springer, pp. 317-326, International Conference on Speech and Computer, Virtual, Online, 27/09/2021 . https://doi.org/10.1007/978-3-030-87802-3_29en
dc.identifier.doi10.1007/978-3-030-87802-3_29en_US
dc.identifier.isbn9783030878016
dc.identifier.issn0302-9743
dc.identifier.issn1611-3349
dc.identifier.otherPURE UUID: 177dd9c4-a6cd-4b16-ad05-31ac8b93033aen_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/177dd9c4-a6cd-4b16-ad05-31ac8b93033aen_US
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85116381911&partnerID=8YFLogxK
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/74623788/Synthesis_speech_based_data_augmentation_for_low_resource_children_ASR_Springer_Lecture_Notes_in_Computer_Science_6.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/110877
dc.identifier.urnURN:NBN:fi:aalto-2021111010048
dc.language.isoenen
dc.relation.ispartofInternational Conference on Speech and Computeren
dc.relation.ispartofseriesSpeech and Computer - 23rd International Conference, SPECOM 2021, Proceedingsen
dc.relation.ispartofseriespp. 317-326en
dc.relation.ispartofseriesLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) ; Volume 12997 LNAIen
dc.rightsopenAccessen
dc.subject.keywordChildren speech recognitionen_US
dc.subject.keywordLow resourceen_US
dc.subject.keywordProsody modificationen_US
dc.subject.keywordSpeech synthesisen_US
dc.subject.keywordTacotronen_US
dc.titleSynthesis Speech Based Data Augmentation for Low Resource Children ASRen
dc.typeA4 Artikkeli konferenssijulkaisussafi
dc.type.versionacceptedVersion

Files