Lombard speech synthesis using transfer learning in a Tacotron text-to-speech system

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorBollepalli, Bajibabuen_US
dc.contributor.authorJuvela, Laurien_US
dc.contributor.authorAlku, Paavoen_US
dc.contributor.departmentDepartment of Signal Processing and Acousticsen
dc.contributor.groupauthorSpeech Communication Technologyen
dc.date.accessioned2020-01-02T13:52:20Z
dc.date.available2020-01-02T13:52:20Z
dc.date.issued2019en_US
dc.description.abstractCurrently, there is increasing interest to use sequence-to-sequence models in text-to-speech (TTS) synthesis with attention like that in Tacotron models. These models are end-to-end, meaning that they learn both co-articulation and duration properties directly from text and speech. Since these models are entirely data-driven, they need large amounts of data to generate synthetic speech of good quality. However, in challenging speaking styles, such as Lombard speech, it is difficult to record sufficiently large speech corpora. Therefore, we propose a transfer learning method to adapt a TTS system of normal speaking style to Lombard style. We also experiment with a WaveNet vocoder along with a traditional vocoder (WORLD) in the synthesis of Lombard speech. The subjective and objective evaluation results indicated that the proposed adaptation system coupled with the WaveNet vocoder clearly outperformed the conventional deep neural network based TTS system in the synthesis of Lombard speechen
dc.description.versionPeer revieweden
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationBollepalli, B, Juvela, L & Alku, P 2019, Lombard speech synthesis using transfer learning in a Tacotron text-to-speech system. in Proceedings of Interspeech. Interspeech - Annual Conference of the International Speech Communication Association, International Speech Communication Association (ISCA), pp. 2833-2837, Interspeech, Graz, Austria, 15/09/2019. https://doi.org/10.21437/Interspeech.2019-1333en
dc.identifier.doi10.21437/Interspeech.2019-1333en_US
dc.identifier.issn2308-457X
dc.identifier.otherPURE UUID: 092e3d15-009a-4474-a907-90a2cd93a677en_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/092e3d15-009a-4474-a907-90a2cd93a677en_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/38768852/ELEC_Bollepalli_Lombard_speech_INTERSPEECH.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/41909
dc.identifier.urnURN:NBN:fi:aalto-202001021020
dc.language.isoenen
dc.relation.ispartofInterspeechen
dc.relation.ispartofseriesProceedings of Interspeechen
dc.relation.ispartofseriespp. 2833-2837en
dc.relation.ispartofseriesInterspeech - Annual Conference of the International Speech Communication Associationen
dc.rightsopenAccessen
dc.subject.keywordAdaptationen_US
dc.subject.keywordLombard speaking styleen_US
dc.subject.keywordTacotronen_US
dc.subject.keywordText-To-Speech (TTS)en_US
dc.titleLombard speech synthesis using transfer learning in a Tacotron text-to-speech systemen
dc.typeA4 Artikkeli konferenssijulkaisussafi
dc.type.versionpublishedVersion

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ELEC_Bollepalli_Lombard_speech_INTERSPEECH.pdf
Size:
538.77 KB
Format:
Adobe Portable Document Format