Preserving Speech Privacy in Interactions with Ad Hoc Sensor Networks

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
School of Electrical Engineering | Doctoral thesis (article-based) | Defence date: 2022-10-28
Degree programme
82 + app. 80
Aalto University publication series DOCTORAL THESES, 145/2022
Speech is our main method of communication that allows us to intuitively communicate complex ideas and provide our messages with deeper meaning than the lexical content of such messages. For example, we can stress specific words to emphasise or subtract significance from different sections of a sentence. Considering this, the increasing popularity of voice user interfaces is only natural and expected to keep growing in the following years, as they allow us to interact with our electronic devices using our speech. Any device with which we can interact using our voice can be considered a voice user interface, and among them we can find a great variety of services, from telecommunication applications like Zoom or Skype, to virtual assistants like Alexa or Siri. However, in order to provide better services and more natural interactions, voice user interfaces require the gathering of a great amount of our speech data and transmitting it usually without us being aware of it. If that data is misused or an unauthorised user manages to obtain it, it would cause a grave violation of the user's privacy. In an environment where multiple electronic devices can provide a voice user interface, collaboration between them as a wireless acoustic sensor network can improve the services that they provide individually. It is important then to study those applications that require sending our voice to a remote party in order to provide their services, and more specifically, in a scenario where multiple devices can pick up the voice of multiple users, it is crucial to define which of these devices are actually allowed to record the user's speech. For example, if a user's voice leaks into another user's interaction, and is therefore transmitted to a destination that they have not specifically authorised, the privacy of the users is violated. As a solution, if our devices could perceive our privacy the same way as we do, they could adapt the information they shared to protect the personal data of the users. For that reason, we need to analyse how users perceive privacy in their spoken interactions, based on which we can devise rules that our devicescan follow when they provide a voice user interface. In this thesis we study methods to recognise when two devices are located in the same acoustic space based on the audio signals that they record. We show how acoustic fingerprints can be used to securely share the audio information from a device and estimate the physical proximity of devices. We also generated a speech corpus in conversational scenarios to analyse the effect that the acoustic properties of the environment have on the level of privacy that we perceive. Finally, we developed source separation methods to remove the voice of interfering speakers in a multi-device scenario, thus protecting the privacy of external users.
Supervising professor
Bäckström,Tom, Prof., Aalto University, Department of Signal Processing and Acoustics, Finland
voice user interface, experience of privacy, audio fingerprint, acoustic sensor networks
Other note
  • [Publication 1]: P. Pérez Zarazaga, T. Bäckström, S. Sigg. Acoustic Fingerprints for Access Management in Ad-Hoc Sensor Networks. IEEE Access,Volume 8, pp. 166083 - 166094, September 2020.
    Full text in Acris/Aaltodoc:
    DOI: 10.1109/ACCESS.2020.3022618 View at publisher
  • [Publication 2]: P. Pérez Zarazaga, S. Sigg, T. Bäckström. Robust and Responsive Acoustic Pairing of Devices Using Decorrelating Time-Frequency Modelling. In Eusipco 2019, pp. 1-5, A Coruña, Spain, September 2019.
    Full text in Acris/Aaltodoc:
    DOI: 10.23919/EUSIPCO.2019.8903125 View at publisher
  • [Publication 3]: P. Pérez Zarazaga, S. Das, T. Bäckström, V. V. Vidyadhara Raju, Anil Kumar Vuppala. Sound Privacy: A conversational Speech Corpus for Quantifying the Experience of Privacy. In Interspeech 2019, pp. 3720-3724, Graz, Austria, October 2019.
    Full text in Acris/Aaltodoc:
  • [Publication 4]: P. Pérez Zarazaga, M. Bouafif Mansali, T. Bäckström, Z. Lachiri. Cancellation of Local Competing Speaker with Near-field Localization for Distributed Ad-Hoc Sensor Network. In Interspeech 2021, pp. 676-680, Brno, Czech Republic, October 2021.
    Full text in Acris/Aaltodoc:
    DOI: 10.21437/Interspeech.2021-1329 View at publisher
  • [Publication 5]: M. Bouafif Mansali, P. Pérez Zarazaga, T. Bäckström, Z. Lachiri. Speech Localization at Low Bitrates in Wireless Acoustics Sensor Networks. Frontiers in Signal Processing, Volume 2, 14 pages, March 2022.
    Full text in Acris/Aaltodoc:
    DOI: 10.3389/frsip.2022.800003 View at publisher
  • [Publication 6]: T. Bäckström, M. Bouafif Mansali, P. Pérez Zarazaga, M. Ranjit, S. Das, Z. Lachiri. PyAWNes-Codec: Speech and Audio Codec for Wireless Acoustic Sensor Networks. In EUSIPCO 2021, pp. 1090-1094, Dublin, Ireland, August 2021.
    Full text in Acris/Aaltodoc:
    DOI: 10.23919/EUSIPCO54536.2021.9616344 View at publisher
  • [Publication 7]: A. Leschanowski, S. Das, T. Bäckström, P. Pérez Zarazaga. Perception of Privacy Measured in the Crowd - Paired Comparison on the Effectof Background Noises. In Interspeech 2020, pp. 4651-4655, Shanghai, China, October 2020.
    Full text in Acris/Aaltodoc:
    DOI: 10.21437/Interspeech.2020-2299 View at publisher
  • [Publication 8]: S. Sigg, L. Ngu Nguyen, P. Pérez Zarazaga, T. Bäckström. Provable Consent for Voice User Interfaces. In 2020 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), pp. 1-4, Austin, Texas, October 2020.
    Full text in Acris/Aaltodoc:
    DOI: 10.1109/PerComWorkshops48775.2020.9156182 View at publisher