Speech Waveform Synthesis from MFCC Sequences with Generative Adversarial Networks

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en
dc.contributor.author Juvela, Lauri
dc.contributor.author Bollepalli, Bajibabu
dc.contributor.author Wang, Xin
dc.contributor.author Kameoka, Hirokazu
dc.contributor.author Airaksinen, Manu
dc.contributor.author Yamagishi, Junichi
dc.contributor.author Alku, Paavo
dc.date.accessioned 2018-12-10T10:11:27Z
dc.date.available 2018-12-10T10:11:27Z
dc.date.issued 2018-09-10
dc.identifier.citation Juvela , L , Bollepalli , B , Wang , X , Kameoka , H , Airaksinen , M , Yamagishi , J & Alku , P 2018 , Speech Waveform Synthesis from MFCC Sequences with Generative Adversarial Networks . in 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings . vol. 2018-April , 8461852 , Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing , Institute of Electrical and Electronics Engineers , United States , pp. 5679-5683 , IEEE International Conference on Acoustics, Speech, and Signal Processing , Calgary , Canada , 15/04/2018 . DOI: 10.1109/ICASSP.2018.8461852 en
dc.identifier.isbn 978-1-5386-4659-5
dc.identifier.isbn 978-1-5386-4658-8
dc.identifier.issn 2379-190X
dc.identifier.other PURE UUID: 1d94f4f0-6d0b-42b3-8d3f-4851a6de8bf8
dc.identifier.other PURE ITEMURL: https://research.aalto.fi/en/publications/speech-waveform-synthesis-from-mfcc-sequences-with-generative-adversarial-networks(1d94f4f0-6d0b-42b3-8d3f-4851a6de8bf8).html
dc.identifier.other PURE LINK: http://www.scopus.com/inward/record.url?scp=85054230031&partnerID=8YFLogxK
dc.identifier.other PURE FILEURL: https://research.aalto.fi/files/28772297/ELEC_Juvela_et_al_IEEEE_ICASSP2018.pdf
dc.identifier.uri https://aaltodoc.aalto.fi/handle/123456789/34947
dc.description.abstract This paper proposes a method for generating speech from filterbank mel frequency cepstral coefficients (MFCC), which are widely used in speech applications, such as ASR, but are generally considered unusable for speech synthesis. First, we predict fundamental frequency and voicing information from MFCCs with an autoregressive recurrent neural net. Second, the spectral envelope information contained in MFCCs is converted to all-pole filters, and a pitch-synchronous excitation model matched to these filters is trained. Finally, we introduce a generative adversarial network-based noise model to add a realistic high-frequency stochastic component to the modeled excitation signal. The results show that high quality speech reconstruction can be obtained, given only MFCC information at test time. en
dc.format.extent 5
dc.format.extent 5679-5683
dc.format.mimetype application/pdf
dc.language.iso en en
dc.relation.ispartof IEEE International Conference on Acoustics, Speech, and Signal Processing en
dc.relation.ispartofseries 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings en
dc.relation.ispartofseries Volume 2018-April en
dc.relation.ispartofseries Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing en
dc.rights openAccess en
dc.subject.other Software en
dc.subject.other Signal Processing en
dc.subject.other Electrical and Electronic Engineering en
dc.subject.other 213 Electronic, automation and communications engineering, electronics en
dc.title Speech Waveform Synthesis from MFCC Sequences with Generative Adversarial Networks en
dc.type A4 Artikkeli konferenssijulkaisussa fi
dc.description.version Peer reviewed en
dc.contributor.department School common, ELEC
dc.contributor.department Department of Signal Processing and Acoustics
dc.contributor.department National Institute of Informatics
dc.contributor.department Nippon Telegraph & Telephone
dc.subject.keyword Excitation modeling
dc.subject.keyword Generative adversarial networks
dc.subject.keyword Mel-filterbank inversion
dc.subject.keyword MFCC
dc.subject.keyword Pitch prediction
dc.subject.keyword Software
dc.subject.keyword Signal Processing
dc.subject.keyword Electrical and Electronic Engineering
dc.subject.keyword 213 Electronic, automation and communications engineering, electronics
dc.identifier.urn URN:NBN:fi:aalto-201812105962
dc.identifier.doi 10.1109/ICASSP.2018.8461852
dc.type.version acceptedVersion


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search archive


Advanced Search

article-iconSubmit a publication

Browse

My Account