Loudspeaker Modelling with Recurrent Neural Networks

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Sähkötekniikan korkeakoulu | Master's thesis

Date

2023-08-21

Department

Major/Subject

Acoustics and Audio Technology

Mcode

ELEC3030

Degree programme

CCIS - Master’s Programme in Computer, Communication and Information Sciences (TS2013)

Language

en

Pages

57

Series

Abstract

Digital twins of loudspeakers are a useful assets for fine-tuning purposes during the design and the manufacturing phase. They can serve as an alternative to real-time measurement for objective evaluation of adjustments made by digital signal processing. Binaural loudspeaker models could introduce a more repeatable framework for subjective listening and provide flexibility for remote work due to the reduced need for actual physical devices. Neural Networks are a well-proven tool for system identification of different audio hardware devices. This thesis project will focus on creating a digital twin of a multimedia stereo loudspeaker system by using stereo audio waveform as the input and a binaural recording of the system's playback as the target waveform for Recurrent Neural Network (RNN) training. The RNN architecture is inspired by the current state-of-the-art method for single channel audio effects modelling, and is adapted for the stereo waveform use case. Firstly, the RNN model is tested with different synthesized target data that simulates the real recorded data. This approach allows us to estimate the properties which are the most challenging for the RNN to learn. Secondly, the experiments are run with a real recorded, time-aligned dataset, and the RNN's performance is objectively evaluated by the Error-To-Signal Ratio (ESR). In the current state-of-the-art method on single channel audio modelling, the initial hidden state of the RNN is computed by using no-gradient startup inference to accumulate the hidden state over the first few hundred samples of the training sequence. The thesis project proposes a new method called Discontinuous Sequence Training (DISCO). The method prepares the training dataset according to the RNNs architecture’s hyper-parameter sequence length and the system's impulse response length, such that it allows for correct initialization of the initial hidden state without additional pre-training inference. DISCO reaches the training and inference precision of hidden state initialization in the current state-of-the-art method for black-box modelling with RNNs only by modifying the dataset.

Description

Supervisor

Schlecht, Sebastian Jiro

Thesis advisor

Schlecht, Sebastian Jiro

Keywords

loudspeaker modelling, digital twin, system dientification, deep learning, stereo modelling, DISCO sequence training

Other note

Citation