Collaborative Watermarking for Adversarial Speech Synthesis
dc.contributor | Aalto-yliopisto | fi |
dc.contributor | Aalto University | en |
dc.contributor.author | Juvela, Lauri | en_US |
dc.contributor.author | Wang, Xin | en_US |
dc.contributor.department | Department of Information and Communications Engineering | en |
dc.contributor.groupauthor | Speech Synthesis | en |
dc.contributor.organization | National Institute of Informatics | en_US |
dc.date.accessioned | 2024-06-20T08:17:44Z | |
dc.date.available | 2024-06-20T08:17:44Z | |
dc.date.issued | 2024-03-18 | en_US |
dc.description.abstract | Advances in neural speech synthesis have brought us technology that is not only close to human naturalness, but is also capable of instant voice cloning with little data, and is highly accessible with pre-trained models available. Naturally, the potential flood of generated content raises the need for synthetic speech detection and watermarking. Recently, considerable research effort in synthetic speech detection has been related to the Automatic Speaker Verification and Spoofing Countermeasure Challenge (ASVspoof), which focuses on passive countermeasures. This paper takes a complementary view to generated speech detection: a synthesis system should make an active effort to watermark the generated speech in a way that aids detection by another machine, but remains transparent to a human listener. We propose a collaborative training scheme for synthetic speech watermarking and show that a HiFi-GAN neural vocoder collaborating with the ASVspoof 2021 baseline countermeasure models consistently improves detection performance over conventional classifier training. Furthermore, we demonstrate how collaborative training can be paired with augmentation strategies for added robustness against noise and time-stretching. Finally, listening tests demonstrate that collaborative training has little adverse effect on perceptual quality of vocoded speech. | en |
dc.description.version | Peer reviewed | en |
dc.format.extent | 5 | |
dc.format.mimetype | application/pdf | en_US |
dc.identifier.citation | Juvela, L & Wang, X 2024, Collaborative Watermarking for Adversarial Speech Synthesis . in ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, pp. 11231-11235, IEEE International Conference on Acoustics, Speech and Signal Processing, Seoul, Korea, Republic of, 14/04/2024 . https://doi.org/10.1109/ICASSP48485.2024.10448134 | en |
dc.identifier.doi | 10.1109/ICASSP48485.2024.10448134 | en_US |
dc.identifier.isbn | 979-8-3503-4485-1 | |
dc.identifier.issn | 2379-190X | |
dc.identifier.other | PURE UUID: 56bbb60d-e09c-45c5-b465-eaf2f8aae759 | en_US |
dc.identifier.other | PURE ITEMURL: https://research.aalto.fi/en/publications/56bbb60d-e09c-45c5-b465-eaf2f8aae759 | en_US |
dc.identifier.other | PURE LINK: http://www.scopus.com/inward/record.url?scp=85195387242&partnerID=8YFLogxK | en_US |
dc.identifier.other | PURE FILEURL: https://research.aalto.fi/files/146186254/TTS_adversial_watermarking_icassp2024-4.pdf | en_US |
dc.identifier.uri | https://aaltodoc.aalto.fi/handle/123456789/129028 | |
dc.identifier.urn | URN:NBN:fi:aalto-202406204614 | |
dc.language.iso | en | en |
dc.relation.ispartof | ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) | |
dc.relation.ispartof | pp. 11231-11235 | |
dc.relation.ispartof | IEEE International Conference on Acoustics, Speech and Signal Processing | en |
dc.relation.ispartofseries | Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing | en |
dc.rights | openAccess | en |
dc.subject.keyword | Training | en_US |
dc.subject.keyword | Vocoders | en_US |
dc.subject.keyword | Collaboration | en_US |
dc.subject.keyword | Watermarking | en_US |
dc.subject.keyword | Voice cloning | en_US |
dc.subject.keyword | Generated speech detection | en_US |
dc.subject.keyword | HiFi-GAN | en_US |
dc.subject.keyword | ASVspoof | en_US |
dc.title | Collaborative Watermarking for Adversarial Speech Synthesis | en |
dc.type | Conference article in proceedings | fi |
dc.type.version | acceptedVersion |