Domain-aware deep learning for room acoustics: Parameter estimation, localization, and source separation

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.advisorIlin, Alexander, Dr. Sc., System 2 AI, Finland
dc.contributor.authorFalcón Pérez, Ricardo
dc.contributor.departmentInformaatio- ja tietoliikennetekniikan laitosfi
dc.contributor.departmentDepartment of Information and Communications Engineeringen
dc.contributor.schoolSähkötekniikan korkeakoulufi
dc.contributor.schoolSchool of Electrical Engineeringen
dc.contributor.supervisorPulkki, Ville, Prof., Aalto University, Department of Information and Communications Engineering, Finland
dc.date.accessioned2026-02-16T10:00:25Z
dc.date.available2026-02-16T10:00:25Z
dc.date.defence2026-02-17
dc.date.issued2026
dc.description.abstractAcoustics fundamentally shape how sound is produced, transmitted, and perceived, influencing both human experience and machine performance. While acoustic properties can in principle be measured in physical environments, such measurements are costly, impractical at scale, and unavailable in virtual or augmented environments. This makes computational approaches—especially domain-aware methods that integrate machine learning with knowledge of sound propagation and signal processing—essential for modeling, mitigating, and exploiting acoustics across applications in spatial audio rendering, scene analysis, and machine listening. This thesis investigates deep learning methods for room acoustics and spatial audio tasks, with acoustics alternately treated as the prediction target, as an interfering factor, or as a source of supervisory structure. First, we address acoustic parameter estimation: from geometric representations of scenes, from measured room impulse responses, and at the level of entire two dimensional floorplans informed by calibration signals. These approaches contribute new feature representations, architectures, and multimodal formulations that make estimation more accurate and efficient. Second, we study robustness in sound event localization and detection by proposing Spatial Mixup, a data augmentation technique that modifies directional loudness patterns in ambisonic recordings to improve generalization. Finally, we introduce a weakly supervised framework for source separation of sounds produced by machines that leverages spatial location as a supervisory signal, enabling separation when isolated ground truth data is unavailable. Together, these works demonstrate how domain-aware learning, which is grounded in both digital signal processing and physical knowledge of sound propagation, can advance the analysis and rendering of acoustics. They also highlight ongoing challenges, including limited dataset diversity, difficulties in generalization to real-world environments, and the need for architectures that explicitly capture acoustic structure. By combining DSP insights with flexible learning frameworks, this thesis contributes toward more robust, interpretable, and perceptually grounded machine listening systems.en
dc.description.accessibilityfeaturenavigointi mahdollistafi
dc.description.accessibilityfeaturestrukturell navigationsv
dc.description.accessibilityfeaturestructural navigationen
dc.format.extent101 + app. 51
dc.format.mimetypeapplication/pdfen
dc.identifier.isbn978-952-64-2987-8 (electronic)
dc.identifier.isbn978-952-64-2988-5 (printed)
dc.identifier.issn1799-4942 (electronic)
dc.identifier.issn1799-4934 (printed)
dc.identifier.issn1799-4934 (ISSN-L)
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/143166
dc.identifier.urnURN:ISBN:978-952-64-2987-8
dc.language.isoenen
dc.opnBello, Juan Pablo, Prof., New York University, USA
dc.publisherAalto Universityen
dc.publisherAalto-yliopistofi
dc.relation.haspart[Publication 1]: Ricardo Falcón Pérez, Georg Götz, and Ville Pulkki. Spherical Maps of Acoustic Properties as Feature Vectors in Machine-Learning-Based Estimation of Acoustic Parameters. Journal of the Audio Engineering Society, Volume 69, Number 9, pp. 632–643, September 2021. Full text in Acris/Aaltodoc: https://urn.fi/URN:NBN:fi:aalto-202109299383. DOI: 10.17743/jaes.2021.0011
dc.relation.haspart[Publication 2]: Georg Götz, Ricardo Falcón Pérez, Sebastian J. Schlecht and Ville Pulkki. Neural Network for multi-exponential sound energy decay analysis. Journal of the Acoustical Society of America, Volume 152, Number 2, pp.942-953, August 2022. Full text in Acris/Aaltodoc: https://urn.fi/URN:NBN:fi:aalto-202209145604. DOI: 10.1121/10.0013416
dc.relation.haspart[Publication 3]: Ricardo Falcón Pérez, Ruohan Gao, Gregor Mueckl, Sebastia V. Amengual Gari, Ishwarya Ananthabhotla. Scene-Wide Acoustic Parameter Estimation. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 5 pages, Oct 2025
dc.relation.haspart[Publication 4]: Ricardo Falcón Pérez, Kazuki Shimada, Yuichiro Koyama, Shusuke Takahashi, and Yuki Mitsufuji. Spatial Mixup: Directional Loudness Modification as Data Augmentation for Sound Event Localization and Detection. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 431-435, May 2022
dc.relation.haspart[Publication 5]: Ricardo Falcón Pérez, Gordon Wichern, François G. Germain, Jonathan Le Roux. Location as Supervision for Weakly Supervised Multi-Channel Source Separation of Machine Sounds. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 1-5, October 2023. Full text in Acris/Aaltodoc: https://urn.fi/URN:NBN:fi:aalto-202310256665. DOI: 10.1109/WASPAA58266.2023.10248128
dc.relation.ispartofseriesAalto University publication series Doctoral Thesesen
dc.relation.ispartofseries40/2026
dc.revBello, Juan Pablo, Prof., New York University, USA
dc.revSerizel, Romain, Prof., Université de Lorraine, France
dc.subject.keywordroom acousticsen
dc.subject.keywordsound source separationen
dc.subject.keywordsound event localization and detectionen
dc.subject.keywordacoustic parameter estimationen
dc.subject.otherCommunicationen
dc.subject.otherInformation systemsen
dc.titleDomain-aware deep learning for room acoustics: Parameter estimation, localization, and source separationen
dc.typeG5 Artikkeliväitöskirjafi
dc.type.dcmitypetexten
dc.type.ontasotDoctoral dissertation (article-based)en
dc.type.ontasotVäitöskirja (artikkeli)fi
local.aalto.acrisexportstatuschecked 2026-02-19_1344
local.aalto.archiveyes
local.aalto.formfolder2026_02_16_klo_10_53
local.aalto.infraAalto Acoustics Lab
local.aalto.infraScience-IT

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
isbn9789526429878.pdf
Size:
7.33 MB
Format:
Adobe Portable Document Format