Lossless neural coding for multi-channel audio

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

School of Electrical Engineering | Master's thesis

Department

Mcode

Language

en

Pages

50

Series

Abstract

Lossless audio coding is quickly becoming an increasingly relevant subject in the area of audio and data management. It refers to the bit-precise storage of audio or speech data in a compressed, often significantly smaller file size. This technique has been gaining significance in the recent years with the rise of automatic speech recognition systems. These systems are often extremely sensitive to noise, which could cause critical misrecognition later on. The high volume of this data, however, prevents direct storage due to increased requirements in hardware, and therefore more expensive running costs. The precise, yet efficient preservation of this data is therefore of crucial importance. Traditional audio codecs based on linear prediction or discrete cosine transforms can achieve significant reductions in data size while remaining lossless. These methods are, however, lacking in adapting to complex patterns that audio recordings often contain. Reducing these redundancies is the key to achieving compression ratios that result in considerably smaller file sizes. This thesis explores the feasibility of neural-based audio coding techniques that can better adapt to these patterns, while comparing them with current state-of-the-art methods. Techniques explored by the thesis include a lossless layer around lossy neural codecs, neural predictors and a neural entropy coder. The results compare these proposed methods and evaluate the feasibility of these techniques for speech and general audio compression.

Description

Supervisor

Bäckström, Tom

Thesis advisor

Gómez, Esteban

Other note

Citation