Multi-Device Speech Enhancement for Privacy and Quality

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.advisorVali, Mohammad Hassan
dc.contributor.authorRech, Silas
dc.contributor.schoolSähkötekniikan korkeakoulufi
dc.contributor.supervisorBäckström, Tom
dc.date.accessioned2022-06-19T17:02:05Z
dc.date.available2022-06-19T17:02:05Z
dc.date.issued2022-06-13
dc.description.abstractThe recent massive increase in telecommunication services usage entails a great advancement in speech enhancement algorithms. Noise suppression, or speech separation methods improve the quality and intelligibility in telecommunication systems and increase the listening pleasantness for all users. However, separating overlapped speech still poses a challenge in real-time scenarios in terms of quality and privacy. Recent approaches require prior additional information such as a reference speech signal. In this work, we introduce Iso-Net, a multichannel, real-time targeted speech separation neural network. With two input channels, one defined as target and one as interfering, Iso-Net generates an inverse mask from interfering speakers which is used to remove undesired speech contents from the targeted input channel. This setup allows Iso-Net to be independent from any a-priori information as well as being able to be run in real-time. To evaluate the enhancement quality, a MUSHRA listening test, PESQ, Si-SNR and mutual information are calculated to validate the performance of the network architecture. The experimental results show that the Iso-Net is able to decrease the mutual information from speech mixtures by 60%, while increasing the perceived speech quality between the original mixture and the Iso-Net output by a factor 2. Hence, Iso-Net can be used to improve the user experience and privacy significantly in various applications such as phone calls, conferencing system, and voice-controlled devices.en
dc.format.extent72
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/115164
dc.identifier.urnURN:NBN:fi:aalto-202206194005
dc.language.isoenen
dc.locationP1fi
dc.programmeCCIS - Master’s Programme in Computer, Communication and Information Sciences (TS2013)fi
dc.programme.majorAcoustics and Audio Technologyfi
dc.programme.mcodeELEC3030fi
dc.subject.keywordtargeted speech separationen
dc.subject.keywordmulti-deviceen
dc.subject.keywordspeeech nhancementen
dc.subject.keywordreal-timeen
dc.subject.keywordvoice isolationen
dc.subject.keywordprivacy-awareen
dc.titleMulti-Device Speech Enhancement for Privacy and Qualityen
dc.typeG2 Pro gradu, diplomityöfi
dc.type.ontasotMaster's thesisen
dc.type.ontasotDiplomityöfi
local.aalto.electroniconlyyes
local.aalto.openaccessno
Files