Multi-Device Speech Enhancement for Privacy and Quality

No Thumbnail Available
Journal Title
Journal ISSN
Volume Title
Sähkötekniikan korkeakoulu | Master's thesis
Date
2022-06-13
Department
Major/Subject
Acoustics and Audio Technology
Mcode
ELEC3030
Degree programme
CCIS - Master’s Programme in Computer, Communication and Information Sciences (TS2013)
Language
en
Pages
72
Series
Abstract
The recent massive increase in telecommunication services usage entails a great advancement in speech enhancement algorithms. Noise suppression, or speech separation methods improve the quality and intelligibility in telecommunication systems and increase the listening pleasantness for all users. However, separating overlapped speech still poses a challenge in real-time scenarios in terms of quality and privacy. Recent approaches require prior additional information such as a reference speech signal. In this work, we introduce Iso-Net, a multichannel, real-time targeted speech separation neural network. With two input channels, one defined as target and one as interfering, Iso-Net generates an inverse mask from interfering speakers which is used to remove undesired speech contents from the targeted input channel. This setup allows Iso-Net to be independent from any a-priori information as well as being able to be run in real-time. To evaluate the enhancement quality, a MUSHRA listening test, PESQ, Si-SNR and mutual information are calculated to validate the performance of the network architecture. The experimental results show that the Iso-Net is able to decrease the mutual information from speech mixtures by 60%, while increasing the perceived speech quality between the original mixture and the Iso-Net output by a factor 2. Hence, Iso-Net can be used to improve the user experience and privacy significantly in various applications such as phone calls, conferencing system, and voice-controlled devices.
Description
Supervisor
Bäckström, Tom
Thesis advisor
Vali, Mohammad Hassan
Keywords
targeted speech separation, multi-device, speeech nhancement, real-time, voice isolation, privacy-aware
Other note
Citation