Multi-device speech enhancement for privacy and quality

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Sähkötekniikan korkeakoulu | Master's thesis

Department

Mcode

ELEC3030

Language

en

Pages

72

Series

Abstract

The recent massive increase in telecommunication services usage entails a great advancement in speech enhancement algorithms. Noise suppression, or speech separation methods improve the quality and intelligibility in telecommunication systems and increase the listening pleasantness for all users. However, separating overlapped speech still poses a challenge in real-time scenarios in terms of quality and privacy. Recent approaches require prior additional information such as a reference speech signal. In this work, we introduce Iso-Net, a multichannel, real-time targeted speech separation neural network. With two input channels, one defined as target and one as interfering, Iso-Net generates an inverse mask from interfering speakers which is used to remove undesired speech contents from the targeted input channel. This setup allows Iso-Net to be independent from any a-priori information as well as being able to be run in real-time. To evaluate the enhancement quality, a MUSHRA listening test, PESQ, Si-SNR and mutual information are calculated to validate the performance of the network architecture. The experimental results show that the Iso-Net is able to decrease the mutual information from speech mixtures by 60%, while increasing the perceived speech quality between the original mixture and the Iso-Net output by a factor 2. Hence, Iso-Net can be used to improve the user experience and privacy significantly in various applications such as phone calls, conferencing system, and voice-controlled devices.

Description

Supervisor

Bäckström, Tom

Thesis advisor

Vali, Mohammad Hassan

Other note

Citation