Speech Waveform Synthesis from MFCC Sequences with Generative Adversarial Networks
Loading...
Access rights
openAccess
acceptedVersion
URL
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
Date
2018-09-10
Major/Subject
Mcode
Degree programme
Language
en
Pages
5
Series
2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings, Volume 2018-April, pp. 5679-5683, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
Abstract
This paper proposes a method for generating speech from filterbank mel frequency cepstral coefficients (MFCC), which are widely used in speech applications, such as ASR, but are generally considered unusable for speech synthesis. First, we predict fundamental frequency and voicing information from MFCCs with an autoregressive recurrent neural net. Second, the spectral envelope information contained in MFCCs is converted to all-pole filters, and a pitch-synchronous excitation model matched to these filters is trained. Finally, we introduce a generative adversarial network-based noise model to add a realistic high-frequency stochastic component to the modeled excitation signal. The results show that high quality speech reconstruction can be obtained, given only MFCC information at test time.Description
Keywords
Excitation modeling, Generative adversarial networks, Mel-filterbank inversion, MFCC, Pitch prediction
Other note
Citation
Juvela, L, Bollepalli, B, Wang, X, Kameoka, H, Airaksinen, M, Yamagishi, J & Alku, P 2018, Speech Waveform Synthesis from MFCC Sequences with Generative Adversarial Networks. in 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings. vol. 2018-April, 8461852, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, United States, pp. 5679-5683, IEEE International Conference on Acoustics, Speech, and Signal Processing, Calgary, Alberta, Canada, 15/04/2018. https://doi.org/10.1109/ICASSP.2018.8461852