GlotNet-A Raw Waveform Model for the Glottal Excitation in Statistical Parametric Speech Synthesis

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en
dc.contributor.author Juvela, Lauri
dc.contributor.author Bollepalli, Bajibabu
dc.contributor.author Tsiaras, Vassilis
dc.contributor.author Alku, Paavo
dc.date.accessioned 2019-04-02T06:58:00Z
dc.date.available 2019-04-02T06:58:00Z
dc.date.issued 2019-06-01
dc.identifier.citation Juvela , L , Bollepalli , B , Tsiaras , V & Alku , P 2019 , ' GlotNet-A Raw Waveform Model for the Glottal Excitation in Statistical Parametric Speech Synthesis ' IEEE/ACM Transactions on Audio, Speech, and Language Processing . https://doi.org/10.1109/TASLP.2019.2906484 en
dc.identifier.issn 2329-9290
dc.identifier.issn 2329-9304
dc.identifier.other PURE UUID: df20945e-f1e8-4959-a017-f1b9a0d6dbee
dc.identifier.other PURE ITEMURL: https://research.aalto.fi/en/publications/glotneta-raw-waveform-model-for-the-glottal-excitation-in-statistical-parametric-speech-synthesis(df20945e-f1e8-4959-a017-f1b9a0d6dbee).html
dc.identifier.other PURE LINK: http://www.scopus.com/inward/record.url?scp=85064621868&partnerID=8YFLogxK
dc.identifier.other PURE FILEURL: https://research.aalto.fi/files/32741491/ELEC_Juvela_GlotNet_TASL.pdf
dc.identifier.uri https://aaltodoc.aalto.fi/handle/123456789/37395
dc.description.abstract Recently, generative neural network models which operate directly on raw audio, such as WaveNet, have improved the state of the art in text-to-speech synthesis (TTS). Moreover, there is increasing interest in using these models as statistical vocoders for generating speech waveforms from various acoustic features. However, there is also a need to reduce the model complexity, without compromising the synthesis quality. Previously, glottal pulseforms (i.e., time-domain waveforms corresponding to the source of human voice production mechanism) have been successfully synthesized in TTS by glottal vocoders using straightforward deep feedforward neural networks. Therefore, it is natural to extend the glottal waveform modeling domain to use the more powerful WaveNet-like architecture. Furthermore, due to their inherent simplicity, glottal excitation waveforms permit scaling down the waveform generator architecture. In this study, we present a raw waveform glottal excitation model, called GlotNet, and compare its performance with the corresponding direct speech waveform model, WaveNet, using equivalent architectures. The models are evaluated as part of a statistical parametric TTS system. Listening test results show that both approaches are rated highly in voice similarity to the target speaker, and obtain similar quality ratings with large models. Furthermore, when the model size is reduced, the quality degradation is less severe for GlotNet. en
dc.format.mimetype application/pdf
dc.language.iso en en
dc.publisher IEEE Advancing Technology for Humanity
dc.relation.ispartofseries IEEE/ACM Transactions on Audio, Speech, and Language Processing en
dc.rights openAccess en
dc.subject.other 213 Electronic, automation and communications engineering, electronics en
dc.title GlotNet-A Raw Waveform Model for the Glottal Excitation in Statistical Parametric Speech Synthesis en
dc.type A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä fi
dc.description.version Peer reviewed en
dc.contributor.department Department of Signal Processing and Acoustics
dc.contributor.department University of Crete
dc.subject.keyword Acoustics
dc.subject.keyword Vocoders
dc.subject.keyword Speech synthesis
dc.subject.keyword Computational modeling
dc.subject.keyword Hidden Markov models
dc.subject.keyword Neural networks
dc.subject.keyword Glottal source model
dc.subject.keyword text-to-speech
dc.subject.keyword WaveNet
dc.subject.keyword 213 Electronic, automation and communications engineering, electronics
dc.identifier.urn URN:NBN:fi:aalto-201904022526
dc.identifier.doi 10.1109/TASLP.2019.2906484
dc.type.version acceptedVersion


Files in this item

Files Size Format View

There are no open access files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search archive


Advanced Search

article-iconSubmit a publication

Browse