Lombard speech synthesis using transfer learning in a Tacotron text-to-speech system
| dc.contributor | Aalto-yliopisto | fi |
| dc.contributor | Aalto University | en |
| dc.contributor.author | Bollepalli, Bajibabu | en_US |
| dc.contributor.author | Juvela, Lauri | en_US |
| dc.contributor.author | Alku, Paavo | en_US |
| dc.contributor.department | Department of Signal Processing and Acoustics | en |
| dc.contributor.groupauthor | Speech Communication Technology | en |
| dc.date.accessioned | 2020-01-02T13:52:20Z | |
| dc.date.available | 2020-01-02T13:52:20Z | |
| dc.date.issued | 2019 | en_US |
| dc.description.abstract | Currently, there is increasing interest to use sequence-to-sequence models in text-to-speech (TTS) synthesis with attention like that in Tacotron models. These models are end-to-end, meaning that they learn both co-articulation and duration properties directly from text and speech. Since these models are entirely data-driven, they need large amounts of data to generate synthetic speech of good quality. However, in challenging speaking styles, such as Lombard speech, it is difficult to record sufficiently large speech corpora. Therefore, we propose a transfer learning method to adapt a TTS system of normal speaking style to Lombard style. We also experiment with a WaveNet vocoder along with a traditional vocoder (WORLD) in the synthesis of Lombard speech. The subjective and objective evaluation results indicated that the proposed adaptation system coupled with the WaveNet vocoder clearly outperformed the conventional deep neural network based TTS system in the synthesis of Lombard speech | en |
| dc.description.version | Peer reviewed | en |
| dc.format.mimetype | application/pdf | en_US |
| dc.identifier.citation | Bollepalli, B, Juvela, L & Alku, P 2019, Lombard speech synthesis using transfer learning in a Tacotron text-to-speech system. in Proceedings of Interspeech. Interspeech - Annual Conference of the International Speech Communication Association, International Speech Communication Association (ISCA), pp. 2833-2837, Interspeech, Graz, Austria, 15/09/2019. https://doi.org/10.21437/Interspeech.2019-1333 | en |
| dc.identifier.doi | 10.21437/Interspeech.2019-1333 | en_US |
| dc.identifier.issn | 2308-457X | |
| dc.identifier.other | PURE UUID: 092e3d15-009a-4474-a907-90a2cd93a677 | en_US |
| dc.identifier.other | PURE ITEMURL: https://research.aalto.fi/en/publications/092e3d15-009a-4474-a907-90a2cd93a677 | en_US |
| dc.identifier.other | PURE FILEURL: https://research.aalto.fi/files/38768852/ELEC_Bollepalli_Lombard_speech_INTERSPEECH.pdf | en_US |
| dc.identifier.uri | https://aaltodoc.aalto.fi/handle/123456789/41909 | |
| dc.identifier.urn | URN:NBN:fi:aalto-202001021020 | |
| dc.language.iso | en | en |
| dc.relation.ispartof | Interspeech | en |
| dc.relation.ispartofseries | Proceedings of Interspeech | en |
| dc.relation.ispartofseries | pp. 2833-2837 | en |
| dc.relation.ispartofseries | Interspeech - Annual Conference of the International Speech Communication Association | en |
| dc.rights | openAccess | en |
| dc.subject.keyword | Adaptation | en_US |
| dc.subject.keyword | Lombard speaking style | en_US |
| dc.subject.keyword | Tacotron | en_US |
| dc.subject.keyword | Text-To-Speech (TTS) | en_US |
| dc.title | Lombard speech synthesis using transfer learning in a Tacotron text-to-speech system | en |
| dc.type | A4 Artikkeli konferenssijulkaisussa | fi |
| dc.type.version | publishedVersion |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- ELEC_Bollepalli_Lombard_speech_INTERSPEECH.pdf
- Size:
- 538.77 KB
- Format:
- Adobe Portable Document Format