aalto1 untyped-item.component.html

Domain-aware deep learning for room acoustics: Parameter estimation, localization, and source separation

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

School of Electrical Engineering | Doctoral thesis (article-based) | Defence date: 2026-02-17
Electronic archive copy is available via Aalto Thesis Database.

Date

Major/Subject

Mcode

Degree programme

Language

en

Pages

101 + app. 51

Series

Aalto University publication series Doctoral Theses, 40/2026

Abstract

Acoustics fundamentally shape how sound is produced, transmitted, and perceived, influencing both human experience and machine performance. While acoustic properties can in principle be measured in physical environments, such measurements are costly, impractical at scale, and unavailable in virtual or augmented environments. This makes computational approaches—especially domain-aware methods that integrate machine learning with knowledge of sound propagation and signal processing—essential for modeling, mitigating, and exploiting acoustics across applications in spatial audio rendering, scene analysis, and machine listening. This thesis investigates deep learning methods for room acoustics and spatial audio tasks, with acoustics alternately treated as the prediction target, as an interfering factor, or as a source of supervisory structure. First, we address acoustic parameter estimation: from geometric representations of scenes, from measured room impulse responses, and at the level of entire two dimensional floorplans informed by calibration signals. These approaches contribute new feature representations, architectures, and multimodal formulations that make estimation more accurate and efficient. Second, we study robustness in sound event localization and detection by proposing Spatial Mixup, a data augmentation technique that modifies directional loudness patterns in ambisonic recordings to improve generalization. Finally, we introduce a weakly supervised framework for source separation of sounds produced by machines that leverages spatial location as a supervisory signal, enabling separation when isolated ground truth data is unavailable. Together, these works demonstrate how domain-aware learning, which is grounded in both digital signal processing and physical knowledge of sound propagation, can advance the analysis and rendering of acoustics. They also highlight ongoing challenges, including limited dataset diversity, difficulties in generalization to real-world environments, and the need for architectures that explicitly capture acoustic structure. By combining DSP insights with flexible learning frameworks, this thesis contributes toward more robust, interpretable, and perceptually grounded machine listening systems.

Description

Supervising professor

Pulkki, Ville, Prof., Aalto University, Department of Information and Communications Engineering, Finland

Thesis advisor

Ilin, Alexander, Dr. Sc., System 2 AI, Finland

Other note

Parts

  • [Publication 1]: Ricardo Falcón Pérez, Georg Götz, and Ville Pulkki. Spherical Maps of Acoustic Properties as Feature Vectors in Machine-Learning-Based Estimation of Acoustic Parameters. Journal of the Audio Engineering Society, Volume 69, Number 9, pp. 632–643, September 2021.
    DOI: 10.17743/jaes.2021.0011 View at publisher
  • [Publication 2]: Georg Götz, Ricardo Falcón Pérez, Sebastian J. Schlecht and Ville Pulkki. Neural Network for multi-exponential sound energy decay analysis. Journal of the Acoustical Society of America, Volume 152, Number 2, pp.942-953, August 2022.
    DOI: 10.1121/10.0013416 View at publisher
  • [Publication 3]: Ricardo Falcón Pérez, Ruohan Gao, Gregor Mueckl, Sebastia V. Amengual Gari, Ishwarya Ananthabhotla. Scene-Wide Acoustic Parameter Estimation. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 5 pages, Oct 2025
  • [Publication 4]: Ricardo Falcón Pérez, Kazuki Shimada, Yuichiro Koyama, Shusuke Takahashi, and Yuki Mitsufuji. Spatial Mixup: Directional Loudness Modification as Data Augmentation for Sound Event Localization and Detection. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 431-435, May 2022
  • [Publication 5]: Ricardo Falcón Pérez, Gordon Wichern, François G. Germain, Jonathan Le Roux. Location as Supervision for Weakly Supervised Multi-Channel Source Separation of Machine Sounds. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 1-5, October 2023.
    DOI: 10.1109/WASPAA58266.2023.10248128 View at publisher

Citation

Endorsement

Review

Supplemented By

Referenced By