A comparison between STRAIGHT, glottal, and sinusoidal vocoding in statistical parametric speech synthesis

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorAiraksinen, Manuen_US
dc.contributor.authorJuvela, Laurien_US
dc.contributor.authorBollepalli, Bajibabuen_US
dc.contributor.authorYamagishi, Junichien_US
dc.contributor.authorAlku, Paavoen_US
dc.contributor.departmentDepartment of Signal Processing and Acousticsen
dc.contributor.groupauthorSpeech Communication Technologyen
dc.contributor.organizationNational Institute of Informaticsen_US
dc.date.accessioned2018-10-24T09:39:58Z
dc.date.available2018-10-24T09:39:58Z
dc.date.issued2018-09en_US
dc.description.abstractA vocoder is used to express a speech waveform with a controllable parametric representation that can be converted back into a speech waveform. Vocoders representing their main categories (mixed excitation, glottal, sinusoidal vocoders) were compared in this study with formal and crowd-sourced listening tests. Vocoder quality was measured within the context of analysis-synthesis as well as text-to-speech (TTS) synthesis in a modern statistical parametric speech synthesis framework. Furthermore, the TTS experiments were divided into synthesis with vocoder-specific features and synthesis with a shared envelope model, where the waveform generation method of the vocoders is mainly responsible for the quality differences. Finally, all of the tests included four distinct voices as a way to investigate the effect of different speakers on the synthesized speech quality. The obtained results suggest that the choice of the voice has a profound impact on the overall quality of the vocoder-generated speech, and the best vocoder for each voice can vary case by case. The single best-rated TTS system was obtained with the glottal vocoder GlottDNN using a male voice with low expressiveness. However, the results indicate that the sinusoidal vocoder PML (pulse model in log-domain) has the best overall performance across the performed tests. Finally, when controlling for the spectral models of the vocoders, the observed differences are similar to the baseline results. This indicates that the waveform generation method of a vocoder is essential for quality improvements.en
dc.description.versionPeer revieweden
dc.format.extent13
dc.format.extent1658-1670
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationAiraksinen, M, Juvela, L, Bollepalli, B, Yamagishi, J & Alku, P 2018, ' A comparison between STRAIGHT, glottal, and sinusoidal vocoding in statistical parametric speech synthesis ', IEEE/ACM Transactions on Audio Speech and Language Processing, vol. 26, no. 9, pp. 1658-1670 . https://doi.org/10.1109/TASLP.2018.2835720en
dc.identifier.doi10.1109/TASLP.2018.2835720en_US
dc.identifier.issn2329-9290
dc.identifier.issn2329-9304
dc.identifier.otherPURE UUID: 64fac994-c10e-42b2-b380-055654b97c05en_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/64fac994-c10e-42b2-b380-055654b97c05en_US
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85046811905&partnerID=8YFLogxKen_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/21729808/ELEC_airaksinen_et_al_Comparison_between_IEEETranOnASLP.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/34464
dc.identifier.urnURN:NBN:fi:aalto-201810245526
dc.language.isoenen
dc.relation.ispartofseriesIEEE/ACM Transactions on Audio Speech and Language Processingen
dc.relation.ispartofseriesVolume 26, issue 9en
dc.rightsopenAccessen
dc.subject.keywordAcousticsen_US
dc.subject.keywordPredictive modelsen_US
dc.subject.keywordProductionen_US
dc.subject.keywordSpeech synthesisen_US
dc.subject.keywordstatistical parametric speech synthesisen_US
dc.subject.keywordTransfer functionsen_US
dc.subject.keywordvocoderen_US
dc.subject.keywordVocodersen_US
dc.titleA comparison between STRAIGHT, glottal, and sinusoidal vocoding in statistical parametric speech synthesisen
dc.typeA1 Alkuperäisartikkeli tieteellisessä aikakauslehdessäfi
dc.type.versionacceptedVersion

Files