Synthesis Speech Based Data Augmentation for Low Resource Children ASR
dc.contributor | Aalto-yliopisto | fi |
dc.contributor | Aalto University | en |
dc.contributor.author | Kadyan, Virender | en_US |
dc.contributor.author | Kathania, Hemant | en_US |
dc.contributor.author | Govil, Prajjval | en_US |
dc.contributor.author | Kurimo, Mikko | en_US |
dc.contributor.department | Department of Signal Processing and Acoustics | en |
dc.contributor.editor | Karpov, Alexey | en_US |
dc.contributor.editor | Potapova, Rodmonga | en_US |
dc.contributor.groupauthor | Speech Recognition | en |
dc.contributor.organization | University of Petroleum and Energy Studies | en_US |
dc.contributor.organization | Department of Signal Processing and Acoustics | en_US |
dc.date.accessioned | 2021-11-10T07:46:36Z | |
dc.date.available | 2021-11-10T07:46:36Z | |
dc.date.issued | 2021 | en_US |
dc.description | Publisher Copyright: © 2021, Springer Nature Switzerland AG. | |
dc.description.abstract | Successful speech recognition for children requires large training data with sufficient speaker variability. The collection of such a training database of children’s voices is challenging and very expensive for zero/low resource language like Punjabi. In this paper, the data scarcity issue of the low resourced language Punjabi is addressed through two levels of augmentation. The original training corpus is first augmented by modifying the prosody parameters for pitch and speaking rate. Our results show that the augmentation improves the system performance over the baseline system. Then the augmented data combined with original data and used to train the TTS system to generate synthesis data and extended dataset is further used for augmented by generating children’s utterances using text-to-speech synthesis and sampling the language model with methods that increase the acoustic and lexical diversity. The final speech recognition performance indicates a relative improvement of 50.10% with acoustic and 57.40% with language diversity based augmentation in comparison to that of the baseline system respectively. | en |
dc.description.version | Peer reviewed | en |
dc.format.extent | 10 | |
dc.format.mimetype | application/pdf | en_US |
dc.identifier.citation | Kadyan, V, Kathania, H, Govil, P & Kurimo, M 2021, Synthesis Speech Based Data Augmentation for Low Resource Children ASR . in A Karpov & R Potapova (eds), Speech and Computer - 23rd International Conference, SPECOM 2021, Proceedings . Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12997 LNAI, Springer, pp. 317-326, International Conference on Speech and Computer, Virtual, Online, 27/09/2021 . https://doi.org/10.1007/978-3-030-87802-3_29 | en |
dc.identifier.doi | 10.1007/978-3-030-87802-3_29 | en_US |
dc.identifier.isbn | 9783030878016 | |
dc.identifier.issn | 0302-9743 | |
dc.identifier.issn | 1611-3349 | |
dc.identifier.other | PURE UUID: 177dd9c4-a6cd-4b16-ad05-31ac8b93033a | en_US |
dc.identifier.other | PURE ITEMURL: https://research.aalto.fi/en/publications/177dd9c4-a6cd-4b16-ad05-31ac8b93033a | en_US |
dc.identifier.other | PURE LINK: http://www.scopus.com/inward/record.url?scp=85116381911&partnerID=8YFLogxK | |
dc.identifier.other | PURE FILEURL: https://research.aalto.fi/files/74623788/Synthesis_speech_based_data_augmentation_for_low_resource_children_ASR_Springer_Lecture_Notes_in_Computer_Science_6.pdf | en_US |
dc.identifier.uri | https://aaltodoc.aalto.fi/handle/123456789/110877 | |
dc.identifier.urn | URN:NBN:fi:aalto-2021111010048 | |
dc.language.iso | en | en |
dc.relation.ispartof | International Conference on Speech and Computer | en |
dc.relation.ispartofseries | Speech and Computer - 23rd International Conference, SPECOM 2021, Proceedings | en |
dc.relation.ispartofseries | pp. 317-326 | en |
dc.relation.ispartofseries | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) ; Volume 12997 LNAI | en |
dc.rights | openAccess | en |
dc.subject.keyword | Children speech recognition | en_US |
dc.subject.keyword | Low resource | en_US |
dc.subject.keyword | Prosody modification | en_US |
dc.subject.keyword | Speech synthesis | en_US |
dc.subject.keyword | Tacotron | en_US |
dc.title | Synthesis Speech Based Data Augmentation for Low Resource Children ASR | en |
dc.type | A4 Artikkeli konferenssijulkaisussa | fi |
dc.type.version | acceptedVersion |