Gelp: GAN-excited linear prediction for speech synthesis from mel-spectrogram
Loading...
Access rights
openAccess
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal
View/Open full text file from the Research portal
Other link related to publication
View publication in the Research portal
View/Open full text file from the Research portal
Other link related to publication
Date
2019-01-01
Department
Major/Subject
Mcode
Degree programme
Language
en
Pages
5
694-698
694-698
Series
Proceedings of Interspeech, Volume 2019-September, Interspeech - Annual Conference of the International Speech Communication Association
Abstract
Recent advances in neural network -based text-to-speech have reached human level naturalness in synthetic speech. The present sequence-to-sequence models can directly map text to mel-spectrogram acoustic features, which are convenient for modeling, but present additional challenges for vocoding (i.e., waveform generation from the acoustic features). High-quality synthesis can be achieved with neural vocoders, such as WaveNet, but such autoregressive models suffer from slow sequential inference. Meanwhile, their existing parallel inference counterparts are difficult to train and require increasingly large model sizes. In this paper, we propose an alternative training strategy for a parallel neural vocoder utilizing generative adversarial networks, and integrate a linear predictive synthesis filter into the model. Results show that the proposed model achieves significant improvement in inference speed, while outperforming a WaveNet in copy-synthesis quality.Description
Keywords
GAN, Neural vocoder, Source-filter model, WaveNet
Other note
Citation
Juvela, L, Bollepalli, B, Yamagishi, J & Alku, P 2019, Gelp: GAN-excited linear prediction for speech synthesis from mel-spectrogram . in Proceedings of Interspeech . vol. 2019-September, Interspeech - Annual Conference of the International Speech Communication Association, International Speech Communication Association (ISCA), pp. 694-698, Interspeech, Graz, Austria, 15/09/2019 . https://doi.org/10.21437/Interspeech.2019-2008