Multilingual TTS Accent Impressions for Accented ASR
| dc.contributor | Aalto-yliopisto | fi |
| dc.contributor | Aalto University | en |
| dc.contributor.author | Karakasidis, Georgios | en_US |
| dc.contributor.author | Robinson, Nathaniel | en_US |
| dc.contributor.author | Getman, Yaroslav | en_US |
| dc.contributor.author | Ogayo, Atieno | en_US |
| dc.contributor.author | Al-Ghezi, Ragheb | en_US |
| dc.contributor.author | Ayasi, Ananya | en_US |
| dc.contributor.author | Watanabe, Shinji | en_US |
| dc.contributor.author | Mortensen, David R. | en_US |
| dc.contributor.author | Kurimo, Mikko | en_US |
| dc.contributor.department | Department of Information and Communications Engineering | en |
| dc.contributor.editor | Ekštein, Kamil | en_US |
| dc.contributor.editor | Pártl, František | en_US |
| dc.contributor.editor | Konopík, Miloslav | en_US |
| dc.contributor.groupauthor | Speech Recognition | en |
| dc.contributor.organization | Department of Information and Communications Engineering | en_US |
| dc.contributor.organization | Carnegie Mellon University | en_US |
| dc.contributor.organization | Speech Recognition | en_US |
| dc.date.accessioned | 2024-01-17T08:28:33Z | |
| dc.date.available | 2024-01-17T08:28:33Z | |
| dc.date.embargo | info:eu-repo/date/embargoEnd/2024-08-23 | en_US |
| dc.date.issued | 2023 | en_US |
| dc.description | Publisher Copyright: © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG. | |
| dc.description.abstract | Automatic Speech Recognition (ASR) for high-resource languages like English is often considered a solved problem. However, most high-resource ASR systems favor socioeconomically advantaged dialects. In the case of English, this leaves behind many L2 speakers and speakers of low-resource accents (a majority of English speakers). One way to mitigate this is to fine-tune a pre-trained English ASR model for a desired low-resource accent. However, collecting transcribed accented audio is costly and time-consuming. In this work, we present a method to produce synthetic L2-English speech via pre-trained text-to-speech (TTS) in an L1 language (target accent). This can be produced at a much larger scale and lower cost than authentic speech collection. We present initial experiments applying this augmentation method. Our results suggest that success of TTS augmentation relies on access to more than one hour of authentic training data and a diversity of target-domain prompts for speech synthesis. | en |
| dc.description.version | Peer reviewed | en |
| dc.format.extent | 11 | |
| dc.format.mimetype | application/pdf | en_US |
| dc.identifier.citation | Karakasidis, G, Robinson, N, Getman, Y, Ogayo, A, Al-Ghezi, R, Ayasi, A, Watanabe, S, Mortensen, D R & Kurimo, M 2023, Multilingual TTS Accent Impressions for Accented ASR. in K Ekštein, F Pártl & M Konopík (eds), Text, Speech, and Dialogue - 26th International Conference, TSD 2023, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 14102 LNAI, Springer, pp. 317-327, International Conference on Text, Speech, and Dialogue, Pilsen, Czech Republic, 04/09/2023. https://doi.org/10.1007/978-3-031-40498-6_28 | en |
| dc.identifier.doi | 10.1007/978-3-031-40498-6_28 | en_US |
| dc.identifier.isbn | 978-3-031-40497-9 | |
| dc.identifier.issn | 0302-9743 | |
| dc.identifier.issn | 1611-3349 | |
| dc.identifier.other | PURE UUID: b64f5c76-8e0c-4a99-b061-a63d15ebdac2 | en_US |
| dc.identifier.other | PURE ITEMURL: https://research.aalto.fi/en/publications/b64f5c76-8e0c-4a99-b061-a63d15ebdac2 | en_US |
| dc.identifier.other | PURE FILEURL: https://research.aalto.fi/files/133739889/Multilingual_TTS_Accent_Impressions_for_Accented_ASR_TSD2023.pdf | |
| dc.identifier.uri | https://aaltodoc.aalto.fi/handle/123456789/125859 | |
| dc.identifier.urn | URN:NBN:fi:aalto-202401171534 | |
| dc.language.iso | en | en |
| dc.relation.ispartof | International Conference on Text, Speech, and Dialogue | en |
| dc.relation.ispartofseries | Text, Speech, and Dialogue - 26th International Conference, TSD 2023, Proceedings | en |
| dc.relation.ispartofseries | pp. 317-327 | en |
| dc.relation.ispartofseries | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) ; Volume 14102 LNAI | en |
| dc.rights | openAccess | en |
| dc.subject.keyword | accented speech recognition | en_US |
| dc.subject.keyword | data augmentation | en_US |
| dc.subject.keyword | low-resource speech technologies | en_US |
| dc.subject.keyword | speech synthesis | en_US |
| dc.title | Multilingual TTS Accent Impressions for Accented ASR | en |
| dc.type | A4 Artikkeli konferenssijulkaisussa | fi |
| dc.type.version | acceptedVersion |