Low-complexity Real-time Neural Network for Blind Bandwidth Extension of Wideband Speech
dc.contributor | Aalto-yliopisto | fi |
dc.contributor | Aalto University | en |
dc.contributor.author | Gómez Mellado, Esteban | en_US |
dc.contributor.author | Vali, Mohammadhassan | en_US |
dc.contributor.author | Bäckström, Tom | en_US |
dc.contributor.department | Department of Information and Communications Engineering | en |
dc.contributor.groupauthor | Speech Interaction Technology | en |
dc.date.accessioned | 2023-12-11T09:52:50Z | |
dc.date.available | 2023-12-11T09:52:50Z | |
dc.date.issued | 2023-09-04 | en_US |
dc.description.abstract | Speech is streamed at 16 kHz or lower sample rates in many applications (e.g. VoIP, Bluetooth headsets). Extending its bandwidth can produce significant quality improvements. We introduce BBWEXNet, a lightweight neural network that performs blind bandwidth extension of speech from 16 kHz (wideband) to 48 kHz (fullband) in real-time in CPU. Our low latency approach allows running the model with a maximum algorithmic delay of 16 ms, enabling end-to-end communication in streaming services and scenarios where the GPU is busy or unavailable. We propose a series of optimizations that take advantage of the U-Net architecture and vector quantization methods commonly used in speech coding, to produce a model whose performance is comparable to previous real-time solutions, but approximately halving the memory footprint and computational cost. Moreover, we show that the model complexity can be further reduced with a marginal impact on the perceived output quality. | en |
dc.description.version | Peer reviewed | en |
dc.format.extent | 5 | |
dc.format.mimetype | application/pdf | en_US |
dc.identifier.citation | Gómez Mellado, E, Vali, M & Bäckström, T 2023, Low-complexity Real-time Neural Network for Blind Bandwidth Extension of Wideband Speech . in 31st European Signal Processing Conference, EUSIPCO 2023 - Proceedings . European Signal Processing Conference, European Association For Signal and Imag Processing, pp. 31-35, European Signal Processing Conference, Helsinki, Finland, 04/09/2023 . https://doi.org/10.23919/EUSIPCO58844.2023.10290072 | en |
dc.identifier.doi | 10.23919/EUSIPCO58844.2023.10290072 | en_US |
dc.identifier.isbn | 978-94-645936-0-0 | |
dc.identifier.other | PURE UUID: d7d9d2b9-624e-420a-8973-d6d832b48f50 | en_US |
dc.identifier.other | PURE ITEMURL: https://research.aalto.fi/en/publications/d7d9d2b9-624e-420a-8973-d6d832b48f50 | en_US |
dc.identifier.other | PURE LINK: https://eagomez2.github.io/bbwexnet/ | en_US |
dc.identifier.other | PURE LINK: https://eurasip.org/eusipco-conferences/ | en_US |
dc.identifier.other | PURE LINK: http://www.scopus.com/inward/record.url?scp=85178334713&partnerID=8YFLogxK | en_US |
dc.identifier.other | PURE FILEURL: https://research.aalto.fi/files/129462038/LOW_COMPLEXITY_REAL_TIME_NEURAL_NETWORK_FOR_BLIND_BANDWIDTH_EXTENSION_OF_WIDEBAND_SPEECH.pdf | en_US |
dc.identifier.uri | https://aaltodoc.aalto.fi/handle/123456789/124898 | |
dc.identifier.urn | URN:NBN:fi:aalto-202312117266 | |
dc.language.iso | en | en |
dc.relation.ispartof | European Signal Processing Conference | en |
dc.relation.ispartofseries | 31st European Signal Processing Conference Proceedings | en |
dc.rights | openAccess | en |
dc.subject.keyword | bandwidth extension | en_US |
dc.subject.keyword | speech processing | en_US |
dc.subject.keyword | real-time | en_US |
dc.subject.keyword | deep learning | en_US |
dc.title | Low-complexity Real-time Neural Network for Blind Bandwidth Extension of Wideband Speech | en |
dc.type | A4 Artikkeli konferenssijulkaisussa | fi |
dc.type.version | acceptedVersion |