Low-complexity Real-time Neural Network for Blind Bandwidth Extension of Wideband Speech

Loading...
Thumbnail Image

Access rights

openAccess

URL

Journal Title

Journal ISSN

Volume Title

A4 Artikkeli konferenssijulkaisussa

Date

2023-09-04

Major/Subject

Mcode

Degree programme

Language

en

Pages

5

Series

31st European Signal Processing Conference Proceedings

Abstract

Speech is streamed at 16 kHz or lower sample rates in many applications (e.g. VoIP, Bluetooth headsets). Extending its bandwidth can produce significant quality improvements. We introduce BBWEXNet, a lightweight neural network that performs blind bandwidth extension of speech from 16 kHz (wideband) to 48 kHz (fullband) in real-time in CPU. Our low latency approach allows running the model with a maximum algorithmic delay of 16 ms, enabling end-to-end communication in streaming services and scenarios where the GPU is busy or unavailable. We propose a series of optimizations that take advantage of the U-Net architecture and vector quantization methods commonly used in speech coding, to produce a model whose performance is comparable to previous real-time solutions, but approximately halving the memory footprint and computational cost. Moreover, we show that the model complexity can be further reduced with a marginal impact on the perceived output quality.

Description

Keywords

bandwidth extension, speech processing, real-time, deep learning

Other note

Citation

Gómez Mellado, E, Vali, M & Bäckström, T 2023, Low-complexity Real-time Neural Network for Blind Bandwidth Extension of Wideband Speech . in 31st European Signal Processing Conference, EUSIPCO 2023 - Proceedings . European Signal Processing Conference, European Association For Signal and Imag Processing, pp. 31-35, European Signal Processing Conference, Helsinki, Finland, 04/09/2023 . https://doi.org/10.23919/EUSIPCO58844.2023.10290072