Cycle-consistent adversarial networks for non-parallel vocal effort based speaking style conversion

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorSeshadri, Shreyasen_US
dc.contributor.authorJuvela, Laurien_US
dc.contributor.authorYamagishi, Junichien_US
dc.contributor.authorRäsänen, Okkoen_US
dc.contributor.authorAlku, Paavoen_US
dc.contributor.departmentDepartment of Signal Processing and Acousticsen
dc.contributor.groupauthorJorma Skyttä's Groupen
dc.contributor.groupauthorSpeech Communication Technologyen
dc.contributor.organizationResearch Organization of Information and Systems, National Institute of Informaticsen_US
dc.date.accessioned2019-06-03T14:14:39Z
dc.date.available2019-06-03T14:14:39Z
dc.date.issued2019-05-01en_US
dc.description.abstractSpeaking style conversion (SSC) is the technology of converting natural speech signals from one style to another. In this study, we propose the use of cycle-consistent adversarial networks (CycleGANs) for converting styles with varying vocal effort, and focus on conversion between normal and Lombard styles as a case study of this problem. We propose a parametric approach that uses the Pulse Model in Log domain (PML) vocoder to extract speech features. These features are mapped using the CycleGAN from utterances in the source style to the corresponding features of target speech. Finally, the mapped features are converted to a Lombard speech waveform with the PML. The CycleGAN was compared in subjective listening tests with 2 other standard mapping methods used in conversion, and the CycleGAN was found to have the best performance in terms of speech quality and in terms of the magnitude of the perceptual change between the two styles.en
dc.description.versionPeer revieweden
dc.format.extent5
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationSeshadri, S, Juvela, L, Yamagishi, J, Räsänen, O & Alku, P 2019, Cycle-consistent adversarial networks for non-parallel vocal effort based speaking style conversion . in ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ., 8682648, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, pp. 6835 - 6839, IEEE International Conference on Acoustics, Speech, and Signal Processing, Brighton, United Kingdom, 12/05/2019 . https://doi.org/10.1109/ICASSP.2019.8682648en
dc.identifier.doi10.1109/ICASSP.2019.8682648en_US
dc.identifier.isbn978-1-4799-8132-8
dc.identifier.isbn978-1-4799-8131-1
dc.identifier.issn1520-6149
dc.identifier.issn2379-190X
dc.identifier.otherPURE UUID: 7b747ee9-e78c-4a67-bc52-dbcf77767c88en_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/7b747ee9-e78c-4a67-bc52-dbcf77767c88en_US
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85069001234&partnerID=8YFLogxK
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/33983411/ELEC_Seshadri_Cycle_consistent_vocal_effort_ICASSP19.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/38298
dc.identifier.urnURN:NBN:fi:aalto-201906033383
dc.language.isoenen
dc.relation.ispartofIEEE International Conference on Acoustics, Speech, and Signal Processingen
dc.relation.ispartofseriesICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)en
dc.relation.ispartofseriespp. 6835 - 6839en
dc.relation.ispartofseriesProceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processingen
dc.rightsopenAccessen
dc.titleCycle-consistent adversarial networks for non-parallel vocal effort based speaking style conversionen
dc.typeA4 Artikkeli konferenssijulkaisussafi
dc.type.versionacceptedVersion

Files