Robust and Efficient Methods for Distributed Speech Processing - Perspectives on Coding, Enhancement and Privacy

Loading...
Thumbnail Image
Journal Title
Journal ISSN
Volume Title
School of Electrical Engineering | Doctoral thesis (article-based) | Defence date: 2021-11-26
Date
2021
Major/Subject
Mcode
Degree programme
Language
en
Pages
70 + app. 70
Series
Aalto University publication series DOCTORAL DISSERTATIONS, 152/2021
Abstract
Computers and technology are so deeply embedded in our lives today that people invest a considerable part of their day communicating with technology. Conventional modes of human-technology interaction have predominantly been device-centric, due to which the users are required to be in the vicinity of the device. This can become cumbersome as the number of personal devices owned by an individual increases. A recent positive trend is the evolution towards user-centric modes of communication with technology enabled by the growing use and adoption of speech user interfaces. Furthermore, developments in the field of virtual and ad~hoc microphone networks and sensor technology are supporting this evolution. As a result, speech processing methods are moving towards a more distributed and collaborative approach. However, this has resulted in new challenges and technical problems in managing speech enhancement, coding and user privacy in acoustic sensor networks. The objectives of this thesis are two-fold: item to develop methods to enable the advancement of conventional speech coding for multiple microphones, item to understand the state of privacy in speech-user interfaces. In the first part we study and develop postfilters for coding with the final goal of advancing the postfilters to enable conventional speech and audio coding methods in distributed microphone networks. A primary requirement in sensor networks is to have systems and algorithms that are simple and robust. Therefore, we develop methods that do not need the transmission of any side information or inter-microphone communication, and the postfilters are based completely at the decoder. To that end, we develop single microphone postfilters that employ the envelope and harmonic models of speech. Following this, we advance these methods to develop a model based postfilter for multi-microphone speech coding using conventional coding approaches. Our experiments demonstrate that by incorporating speech models in the postfilters as proposed, the output signal quality is improved in comparison to other baseline postfiltering and enhancement approaches. The lack of user privacy considerations in the design of speech interfaces has had an adverse impact on their widespread adoption. Therefore, methods to enforce the privacy of users within the framework of speech interfaces are necessary and timely. In the second part of the thesis, we address how to instill smart speech interfaces with an intuitive understanding of user privacy preferences. Towards that end, we investigate the perception of privacy for people in noisy acoustic scenarios. The results indicate that individuals have an intuitive understanding of privacy in speech communication that is dependent on the acoustic scenarios among other factors. The insights from these studies can be further exploited by conditioning the privacy preferences on the sensed acoustic environment in a speech interface.
Description
Defence is held on 26.11.2021 12:00 – 15:00 Zoom: https://aalto.zoom.us/j/61255513284
Supervising professor
Bäckström, Tom, Prof., Aalto University, Department of Signal Processing and Acoustics, Finland
Thesis advisor
Bäckström, Tom, Prof., Aalto University, Department of Signal Processing and Acoustics, Finland
Keywords
speech, speech-coding, privacy, postfiltering, speech-interfaces
Other note
Parts
  • [Publication 1]: Sneha Das, Tom Bäckström. Postfiltering with Complex Spectral Correlations for Speech and Audio Coding. In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Hyderabad, pp. 3538-3542, September 2018.
    Full text in Acris/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-201812106350
    DOI: 10.21437/Interspeech.2018-1026 View at publisher
  • [Publication 2]: Sneha Das, Tom Bäckström. Postfiltering Using Log-Magnitude Spectrum for Speech and Audio Coding. In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Hyderabad, pp. 3543-3547, September 2018.
    Full text in Acris/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-201812106385
    DOI: 10.21437/Interspeech.2018-1027 View at publisher
  • [Publication 3]: Sneha Das, Tom Bäckström, Guillaume Fuchs. Fundamental Frequency Model for Postfiltering at Low Bitrates in a Transform-Domain Speech and Audio Codec. In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Shanghai, China, pp. 2837-2841, October 2020.
    Full text in Acris/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-202101251402
    DOI: 10.21437/Interspeech.2020-1067 View at publisher
  • [Publication 4]: Sneha Das, Tom Bäckström. Enhancement by postfiltering for speech and audio coding in ad-hoc sensor networks. The Journal of the Acoustical Society of America Express Letters, pp. 015206, January 2021.
    DOI: 10.1121/10.0003208 View at publisher
  • [Publication 5]: Sneha Das, Tom Bäckström. Postfiltering Using Source Modeling for Speech and Audio Coding in Ad Hoc Sensor Networks. Submitted to IEEE Access, 2021
  • [Publication 6]: Pablo Perez Zarazaga, Sneha Das, Tom Bäckström, V. V. Vidyadhara Raju, Anil Kumar Vuppala. Sound Privacy: A Conversational Speech Corpus for Quantifying the Experience of Privacy. In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Graz, Austria, pp. 3720-3724, September 2019.
    Full text in Acris/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-201909255496
  • [Publication 7]: Anna Leschanowsky, Sneha Das, Tom Bäckström. Perception of Privacy Measured in the Crowd–Paired Comparison on the Effect of Background Noises. In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Shanghai, China, pp. 4651-4654, October 2020.
    Full text in Acris/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-202101251597
    DOI: 10.21437/Interspeech.2020-2299 View at publisher
Citation