Cycle-consistent adversarial networks for non-parallel vocal effort based speaking style conversion
No Thumbnail Available
Access rights
openAccess
URL
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
Date
2019-05-01
Major/Subject
Mcode
Degree programme
Language
en
Pages
6835 - 6839
Series
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
Abstract
Speaking style conversion (SSC) is the technology of converting natural speech signals from one style to another. In this study, we propose the use of cycle-consistent adversarial networks (CycleGANs) for converting styles with varying vocal effort, and focus on conversion between normal and Lombard styles as a case study of this problem. We propose a parametric approach that uses the Pulse Model in Log domain (PML) vocoder to extract speech features. These features are mapped using the CycleGAN from utterances in the source style to the corresponding features of target speech. Finally, the mapped features are converted to a Lombard speech waveform with the PML. The CycleGAN was compared in subjective listening tests with 2 other standard mapping methods used in conversion, and the CycleGAN was found to have the best performance in terms of speech quality and in terms of the magnitude of the perceptual change between the two styles.Description
Keywords
Other note
Citation
Seshadri, S, Juvela, L, Yamagishi, J, Räsänen, O & Alku, P 2019, Cycle-consistent adversarial networks for non-parallel vocal effort based speaking style conversion . in ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ., 8682648, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, pp. 6835 - 6839, IEEE International Conference on Acoustics, Speech, and Signal Processing, Brighton, United Kingdom, 12/05/2019 . https://doi.org/10.1109/ICASSP.2019.8682648