Low-complexity Real-time Neural Network for Blind Bandwidth Extension of Wideband Speech

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorGómez Mellado, Estebanen_US
dc.contributor.authorVali, Mohammadhassanen_US
dc.contributor.authorBäckström, Tomen_US
dc.contributor.departmentDepartment of Information and Communications Engineeringen
dc.contributor.groupauthorSpeech Interaction Technologyen
dc.date.accessioned2023-12-11T09:52:50Z
dc.date.available2023-12-11T09:52:50Z
dc.date.issued2023-09-04en_US
dc.description.abstractSpeech is streamed at 16 kHz or lower sample rates in many applications (e.g. VoIP, Bluetooth headsets). Extending its bandwidth can produce significant quality improvements. We introduce BBWEXNet, a lightweight neural network that performs blind bandwidth extension of speech from 16 kHz (wideband) to 48 kHz (fullband) in real-time in CPU. Our low latency approach allows running the model with a maximum algorithmic delay of 16 ms, enabling end-to-end communication in streaming services and scenarios where the GPU is busy or unavailable. We propose a series of optimizations that take advantage of the U-Net architecture and vector quantization methods commonly used in speech coding, to produce a model whose performance is comparable to previous real-time solutions, but approximately halving the memory footprint and computational cost. Moreover, we show that the model complexity can be further reduced with a marginal impact on the perceived output quality.en
dc.description.versionPeer revieweden
dc.format.extent5
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationGómez Mellado, E, Vali, M & Bäckström, T 2023, Low-complexity Real-time Neural Network for Blind Bandwidth Extension of Wideband Speech . in 31st European Signal Processing Conference, EUSIPCO 2023 - Proceedings . European Signal Processing Conference, European Association For Signal and Imag Processing, pp. 31-35, European Signal Processing Conference, Helsinki, Finland, 04/09/2023 . https://doi.org/10.23919/EUSIPCO58844.2023.10290072en
dc.identifier.doi10.23919/EUSIPCO58844.2023.10290072en_US
dc.identifier.isbn978-94-645936-0-0
dc.identifier.otherPURE UUID: d7d9d2b9-624e-420a-8973-d6d832b48f50en_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/d7d9d2b9-624e-420a-8973-d6d832b48f50en_US
dc.identifier.otherPURE LINK: https://eagomez2.github.io/bbwexnet/en_US
dc.identifier.otherPURE LINK: https://eurasip.org/eusipco-conferences/en_US
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85178334713&partnerID=8YFLogxKen_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/129462038/LOW_COMPLEXITY_REAL_TIME_NEURAL_NETWORK_FOR_BLIND_BANDWIDTH_EXTENSION_OF_WIDEBAND_SPEECH.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/124898
dc.identifier.urnURN:NBN:fi:aalto-202312117266
dc.language.isoenen
dc.relation.ispartofEuropean Signal Processing Conferenceen
dc.relation.ispartofseries31st European Signal Processing Conference Proceedingsen
dc.rightsopenAccessen
dc.subject.keywordbandwidth extensionen_US
dc.subject.keywordspeech processingen_US
dc.subject.keywordreal-timeen_US
dc.subject.keyworddeep learningen_US
dc.titleLow-complexity Real-time Neural Network for Blind Bandwidth Extension of Wideband Speechen
dc.typeA4 Artikkeli konferenssijulkaisussafi
dc.type.versionacceptedVersion

Files