Learning Centre

A comparison between STRAIGHT, glottal, and sinusoidal vocoding in statistical parametric speech synthesis

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en
dc.contributor.author Airaksinen, Manu
dc.contributor.author Juvela, Lauri
dc.contributor.author Bollepalli, Bajibabu
dc.contributor.author Yamagishi, Junichi
dc.contributor.author Alku, Paavo
dc.date.accessioned 2018-10-24T09:39:58Z
dc.date.available 2018-10-24T09:39:58Z
dc.date.issued 2018-09
dc.identifier.citation Airaksinen , M , Juvela , L , Bollepalli , B , Yamagishi , J & Alku , P 2018 , ' A comparison between STRAIGHT, glottal, and sinusoidal vocoding in statistical parametric speech synthesis ' , IEEE/ACM Transactions on Audio Speech and Language Processing , vol. 26 , no. 9 , pp. 1658-1670 . https://doi.org/10.1109/TASLP.2018.2835720 en
dc.identifier.issn 2329-9290
dc.identifier.issn 2329-9304
dc.identifier.other PURE UUID: 64fac994-c10e-42b2-b380-055654b97c05
dc.identifier.other PURE ITEMURL: https://research.aalto.fi/en/publications/a-comparison-between-straight-glottal-and-sinusoidal-vocoding-in-statistical-parametric-speech-synthesis(64fac994-c10e-42b2-b380-055654b97c05).html
dc.identifier.other PURE LINK: http://www.scopus.com/inward/record.url?scp=85046811905&partnerID=8YFLogxK
dc.identifier.other PURE FILEURL: https://research.aalto.fi/files/21729808/ELEC_airaksinen_et_al_Comparison_between_IEEETranOnASLP.pdf
dc.identifier.uri https://aaltodoc.aalto.fi/handle/123456789/34464
dc.description.abstract A vocoder is used to express a speech waveform with a controllable parametric representation that can be converted back into a speech waveform. Vocoders representing their main categories (mixed excitation, glottal, sinusoidal vocoders) were compared in this study with formal and crowd-sourced listening tests. Vocoder quality was measured within the context of analysis-synthesis as well as text-to-speech (TTS) synthesis in a modern statistical parametric speech synthesis framework. Furthermore, the TTS experiments were divided into synthesis with vocoder-specific features and synthesis with a shared envelope model, where the waveform generation method of the vocoders is mainly responsible for the quality differences. Finally, all of the tests included four distinct voices as a way to investigate the effect of different speakers on the synthesized speech quality. The obtained results suggest that the choice of the voice has a profound impact on the overall quality of the vocoder-generated speech, and the best vocoder for each voice can vary case by case. The single best-rated TTS system was obtained with the glottal vocoder GlottDNN using a male voice with low expressiveness. However, the results indicate that the sinusoidal vocoder PML (pulse model in log-domain) has the best overall performance across the performed tests. Finally, when controlling for the spectral models of the vocoders, the observed differences are similar to the baseline results. This indicates that the waveform generation method of a vocoder is essential for quality improvements. en
dc.format.extent 13
dc.format.extent 1658-1670
dc.format.mimetype application/pdf
dc.language.iso en en
dc.relation.ispartofseries IEEE/ACM Transactions on Audio Speech and Language Processing en
dc.relation.ispartofseries Volume 26, issue 9 en
dc.rights openAccess en
dc.subject.other Computer Science (miscellaneous) en
dc.subject.other Acoustics and Ultrasonics en
dc.subject.other Computational Mathematics en
dc.subject.other Electrical and Electronic Engineering en
dc.subject.other 113 Computer and information sciences en
dc.title A comparison between STRAIGHT, glottal, and sinusoidal vocoding in statistical parametric speech synthesis en
dc.type A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä fi
dc.description.version Peer reviewed en
dc.contributor.department Department of Signal Processing and Acoustics
dc.contributor.department National Institute of Informatics
dc.subject.keyword Acoustics
dc.subject.keyword Predictive models
dc.subject.keyword Production
dc.subject.keyword Speech synthesis
dc.subject.keyword statistical parametric speech synthesis
dc.subject.keyword Transfer functions
dc.subject.keyword vocoder
dc.subject.keyword Vocoders
dc.subject.keyword Computer Science (miscellaneous)
dc.subject.keyword Acoustics and Ultrasonics
dc.subject.keyword Computational Mathematics
dc.subject.keyword Electrical and Electronic Engineering
dc.subject.keyword 113 Computer and information sciences
dc.identifier.urn URN:NBN:fi:aalto-201810245526
dc.identifier.doi 10.1109/TASLP.2018.2835720
dc.type.version acceptedVersion

Files in this item

Files Size Format View

There are no open access files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search archive

Advanced Search

article-iconSubmit a publication