Machine-learning-based estimation of room acoustic parameters

Loading...
Thumbnail Image
Journal Title
Journal ISSN
Volume Title
Sähkötekniikan korkeakoulu | Master's thesis
Date
2018-12-10
Department
Major/Subject
Acoustics and Audio Technology
Mcode
ELEC3030
Degree programme
CCIS - Master’s Programme in Computer, Communication and Information Sciences (TS2013)
Language
en
Pages
72+6
Series
Abstract
Traditional methods to study sound propagation inside rooms can be divided in two approaches: geometrical models and wave-based models. In the former, sound is analyzed as rays, giving a valid approximation for high frequencies while failing to model certain wave effects such as diffraction or inference. The latter, finds solutions for the wave equation, providing better accuracy at the cost of much higher computational complexity. This thesis presents a proof of concept for a novel machine learning method to estimate a set of typical room acoustics parameters using only geometrical information as input features. First, a room acoustics dataset composed of real world acoustical measurements is analyzed and processed using microphone array encoding techniques to extract room impulse responses and acoustical absorption area for multiple directions. The dataset is explored to identify correlation between features and general properties, including a low dimensionality representation for visualization. The proposed method uses geometrical features as input for a neural network model that estimates room acoustics parameters, such as reverberation time (T60), and early decay time (EDT). For reverberation time, this model is evaluated against the Sabine method and the results show much higher accuracy, especially at low frequencies. The method is then expanded to include input features for the locations of the source and microphone, where the results also achieve high performance. Furthermore, an hyperparameter optimization procedure using random search reveals three main findings. First, that a large range of neural networks architectures, even with very few trainable parameters, achieve high performance. Second, the depth of the models has little influence on the results. Third, the benefit of increasing the amount of training data examples for a single loudspeaker saturates after around 100 examples.
Description
Supervisor
Pulkki, Ville
Thesis advisor
McCormack, Leo
Keywords
room acoustics, room impulse response, machine learning, neural networks, microhpone array, data analysis
Other note
Citation