Robust and Efficient Methods for Distributed Speech Processing - Perspectives on Coding, Enhancement and Privacy

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.advisorBäckström, Tom, Prof., Aalto University, Department of Signal Processing and Acoustics, Finland
dc.contributor.authorDas, Sneha
dc.contributor.departmentSignaalinkäsittelyn ja akustiikan laitosfi
dc.contributor.departmentDepartment of Signal Processing and Acousticsen
dc.contributor.labSpeech Interaction Technologyen
dc.contributor.schoolSähkötekniikan korkeakoulufi
dc.contributor.schoolSchool of Electrical Engineeringen
dc.contributor.supervisorBäckström, Tom, Prof., Aalto University, Department of Signal Processing and Acoustics, Finland
dc.date.accessioned2021-11-15T10:00:08Z
dc.date.available2021-11-15T10:00:08Z
dc.date.defence2021-11-26
dc.date.issued2021
dc.descriptionDefence is held on 26.11.2021 12:00 – 15:00 Zoom: https://aalto.zoom.us/j/61255513284
dc.description.abstractComputers and technology are so deeply embedded in our lives today that people invest a considerable part of their day communicating with technology. Conventional modes of human-technology interaction have predominantly been device-centric, due to which the users are required to be in the vicinity of the device. This can become cumbersome as the number of personal devices owned by an individual increases. A recent positive trend is the evolution towards user-centric modes of communication with technology enabled by the growing use and adoption of speech user interfaces. Furthermore, developments in the field of virtual and ad~hoc microphone networks and sensor technology are supporting this evolution. As a result, speech processing methods are moving towards a more distributed and collaborative approach. However, this has resulted in new challenges and technical problems in managing speech enhancement, coding and user privacy in acoustic sensor networks. The objectives of this thesis are two-fold: item to develop methods to enable the advancement of conventional speech coding for multiple microphones, item to understand the state of privacy in speech-user interfaces. In the first part we study and develop postfilters for coding with the final goal of advancing the postfilters to enable conventional speech and audio coding methods in distributed microphone networks. A primary requirement in sensor networks is to have systems and algorithms that are simple and robust. Therefore, we develop methods that do not need the transmission of any side information or inter-microphone communication, and the postfilters are based completely at the decoder. To that end, we develop single microphone postfilters that employ the envelope and harmonic models of speech. Following this, we advance these methods to develop a model based postfilter for multi-microphone speech coding using conventional coding approaches. Our experiments demonstrate that by incorporating speech models in the postfilters as proposed, the output signal quality is improved in comparison to other baseline postfiltering and enhancement approaches. The lack of user privacy considerations in the design of speech interfaces has had an adverse impact on their widespread adoption. Therefore, methods to enforce the privacy of users within the framework of speech interfaces are necessary and timely. In the second part of the thesis, we address how to instill smart speech interfaces with an intuitive understanding of user privacy preferences. Towards that end, we investigate the perception of privacy for people in noisy acoustic scenarios. The results indicate that individuals have an intuitive understanding of privacy in speech communication that is dependent on the acoustic scenarios among other factors. The insights from these studies can be further exploited by conditioning the privacy preferences on the sensed acoustic environment in a speech interface.en
dc.format.extent70 + app. 70
dc.format.mimetypeapplication/pdfen
dc.identifier.isbn978-952-64-0576-6 (electronic)
dc.identifier.isbn978-952-64-0575-9 (printed)
dc.identifier.issn1799-4942 (electronic)
dc.identifier.issn1799-4934 (printed)
dc.identifier.issn1799-4934 (ISSN-L)
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/110941
dc.identifier.urnURN:ISBN:978-952-64-0576-6
dc.language.isoenen
dc.opnChristensen, Mads Græsbøll, Prof., Aalborg University, Denmark
dc.publisherAalto Universityen
dc.publisherAalto-yliopistofi
dc.relation.haspart[Publication 1]: Sneha Das, Tom Bäckström. Postfiltering with Complex Spectral Correlations for Speech and Audio Coding. In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Hyderabad, pp. 3538-3542, September 2018. Full text in Acris/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-201812106350. DOI: 10.21437/Interspeech.2018-1026
dc.relation.haspart[Publication 2]: Sneha Das, Tom Bäckström. Postfiltering Using Log-Magnitude Spectrum for Speech and Audio Coding. In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Hyderabad, pp. 3543-3547, September 2018. Full text in Acris/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-201812106385. DOI: 10.21437/Interspeech.2018-1027
dc.relation.haspart[Publication 3]: Sneha Das, Tom Bäckström, Guillaume Fuchs. Fundamental Frequency Model for Postfiltering at Low Bitrates in a Transform-Domain Speech and Audio Codec. In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Shanghai, China, pp. 2837-2841, October 2020. Full text in Acris/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-202101251402. DOI: 10.21437/Interspeech.2020-1067
dc.relation.haspart[Publication 4]: Sneha Das, Tom Bäckström. Enhancement by postfiltering for speech and audio coding in ad-hoc sensor networks. The Journal of the Acoustical Society of America Express Letters, pp. 015206, January 2021. DOI: 10.1121/10.0003208
dc.relation.haspart[Publication 5]: Sneha Das, Tom Bäckström. Postfiltering Using Source Modeling for Speech and Audio Coding in Ad Hoc Sensor Networks. Submitted to IEEE Access, 2021
dc.relation.haspart[Publication 6]: Pablo Perez Zarazaga, Sneha Das, Tom Bäckström, V. V. Vidyadhara Raju, Anil Kumar Vuppala. Sound Privacy: A Conversational Speech Corpus for Quantifying the Experience of Privacy. In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Graz, Austria, pp. 3720-3724, September 2019. Full text in Acris/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-201909255496.
dc.relation.haspart[Publication 7]: Anna Leschanowsky, Sneha Das, Tom Bäckström. Perception of Privacy Measured in the Crowd–Paired Comparison on the Effect of Background Noises. In Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), Shanghai, China, pp. 4651-4654, October 2020. Full text in Acris/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-202101251597. DOI: 10.21437/Interspeech.2020-2299
dc.relation.ispartofseriesAalto University publication series DOCTORAL DISSERTATIONSen
dc.relation.ispartofseries152/2021
dc.revGournay, Philippe, Adj. Prof., Université de Sherbrooke, Canada
dc.revVirtanen, Tuomas, Prof., Tampere University, Finland
dc.subject.keywordspeechen
dc.subject.keywordspeech-codingen
dc.subject.keywordprivacyen
dc.subject.keywordpostfilteringen
dc.subject.keywordspeech-interfacesen
dc.subject.otherElectrical engineeringen
dc.titleRobust and Efficient Methods for Distributed Speech Processing - Perspectives on Coding, Enhancement and Privacyen
dc.typeG5 Artikkeliväitöskirjafi
dc.type.dcmitypetexten
dc.type.ontasotDoctoral dissertation (article-based)en
dc.type.ontasotVäitöskirja (artikkeli)fi
local.aalto.acrisexportstatuschecked 2021-11-29_1549
local.aalto.archiveyes
local.aalto.formfolder2021_11_14_klo_15_32
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
isbn9789526405766.pdf
Size:
9.88 MB
Format:
Adobe Portable Document Format