Collaborative Watermarking for Adversarial Speech Synthesis

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorJuvela, Laurien_US
dc.contributor.authorWang, Xinen_US
dc.contributor.departmentDepartment of Information and Communications Engineeringen
dc.contributor.groupauthorSpeech Synthesisen
dc.contributor.organizationNational Institute of Informaticsen_US
dc.date.accessioned2024-06-20T08:17:44Z
dc.date.available2024-06-20T08:17:44Z
dc.date.issued2024-03-18en_US
dc.description.abstractAdvances in neural speech synthesis have brought us technology that is not only close to human naturalness, but is also capable of instant voice cloning with little data, and is highly accessible with pre-trained models available. Naturally, the potential flood of generated content raises the need for synthetic speech detection and watermarking. Recently, considerable research effort in synthetic speech detection has been related to the Automatic Speaker Verification and Spoofing Countermeasure Challenge (ASVspoof), which focuses on passive countermeasures. This paper takes a complementary view to generated speech detection: a synthesis system should make an active effort to watermark the generated speech in a way that aids detection by another machine, but remains transparent to a human listener. We propose a collaborative training scheme for synthetic speech watermarking and show that a HiFi-GAN neural vocoder collaborating with the ASVspoof 2021 baseline countermeasure models consistently improves detection performance over conventional classifier training. Furthermore, we demonstrate how collaborative training can be paired with augmentation strategies for added robustness against noise and time-stretching. Finally, listening tests demonstrate that collaborative training has little adverse effect on perceptual quality of vocoded speech.en
dc.description.versionPeer revieweden
dc.format.extent5
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationJuvela, L & Wang, X 2024, Collaborative Watermarking for Adversarial Speech Synthesis . in ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, pp. 11231-11235, IEEE International Conference on Acoustics, Speech and Signal Processing, Seoul, Korea, Republic of, 14/04/2024 . https://doi.org/10.1109/ICASSP48485.2024.10448134en
dc.identifier.doi10.1109/ICASSP48485.2024.10448134en_US
dc.identifier.isbn979-8-3503-4485-1
dc.identifier.issn2379-190X
dc.identifier.otherPURE UUID: 56bbb60d-e09c-45c5-b465-eaf2f8aae759en_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/56bbb60d-e09c-45c5-b465-eaf2f8aae759en_US
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85195387242&partnerID=8YFLogxKen_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/146186254/TTS_adversial_watermarking_icassp2024-4.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/129028
dc.identifier.urnURN:NBN:fi:aalto-202406204614
dc.language.isoenen
dc.relation.ispartofICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
dc.relation.ispartofpp. 11231-11235
dc.relation.ispartofIEEE International Conference on Acoustics, Speech and Signal Processingen
dc.relation.ispartofseriesProceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processingen
dc.rightsopenAccessen
dc.subject.keywordTrainingen_US
dc.subject.keywordVocodersen_US
dc.subject.keywordCollaborationen_US
dc.subject.keywordWatermarkingen_US
dc.subject.keywordVoice cloningen_US
dc.subject.keywordGenerated speech detectionen_US
dc.subject.keywordHiFi-GANen_US
dc.subject.keywordASVspoofen_US
dc.titleCollaborative Watermarking for Adversarial Speech Synthesisen
dc.typeConference article in proceedingsfi
dc.type.versionacceptedVersion
Files