Spatial post-filtering for speech enhancement and source separation
Loading...
URL
Journal Title
Journal ISSN
Volume Title
School of Electrical Engineering |
Doctoral thesis (article-based)
| Defence date: 2026-01-16
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
Major/Subject
Mcode
Degree programme
Language
en
Pages
63 + app. 52
Series
Aalto University publication series Doctoral Theses, 18/2026
Abstract
Microphone arrays with limited spatial resolution, such as compact and low-order configurations, present challenges for spatial audio processing tasks including sound source separation and interference suppression. Traditional beamforming methods alone may lack the robustness and adaptability required for practical audio scenarios, particularly under constrained sensor arrangements. Further, despite advancements in spatial audio, most common spatial recording devices only utilize low-order, often first-order, directional signals. This thesis addresses these challenges by developing spatial post-filtering techniques inspired by the Cross-Pattern Coherence (CroPaC) algorithm, aiming to enhance the performance of omnidirectional and low-order directional microphone arrays. First, we formulate a CroPaC-based post-filter estimated from low-order beamformer outputs. Even when the array departs from ideal geometric assumptions, simulations in multi-speaker conditions demonstrate consistent suppression of interference and background noise. Second, we investigate a space-domain variant (SD-CroPaC) for linear arrays with a focus on speech enhancement. Using single- and dual-line uniform linear arrays (ULAs), common beamforming ambiguities (e.g., front–back) can be suppressed. A combination that uses two post-filters yields stronger separation than either filter alone while preserving target speech. Third, building on these insights, we propose a non-linear combination strategy that optimizes the merging of multiple CroPaC post-filters estimated from low-order directional signals. This approach improves spatial selectivity and hence interferer suppression, and outperforms a higher-order CroPaC baseline under the same conditions. Finally, we generalize coherence-driven post-filtering to distributed omnidirectional microphones by introducing aggregated pairwise similarity measures. This results in a soft mask that remains effective when sensors are spatially separated, with positional errors, and in a reverberant acoustic environment. Collectively, these contributions significantly advance spatial audio processing techniques, providing robust and practical solutions to improve spatial selectivity and interference suppression capabilities of low-order and distributed microphone array systems, with applications to portable recorders, headsets, smartphones, and distributed omnidirectional microphone systems.Description
Supervising professor
Pulkki, Ville, Prof., Aalto University, Department of Information and Communications Engineering, FinlandThesis advisor
Pulkki, Ville, Prof., Aalto University, Department of Information and Communications Engineering, FinlandKeywords
Other note
Parts
-
[Publication 1]: Stefan Wirler and Ville Pulkki. Spatial post-filter estimation based on low-order beamformers. In Proceedings of the 24th International Congress on Acoustics (ICA 2022), Gyeongju, South Korea, October 2022.
Full text in Acris/Aaltodoc: https://urn.fi/URN:NBN:fi:aalto-202509237315
-
[Publication 2]: Stefan Wirler, Vasileios Bountourakis and Ville Pulkki. Space-domain cross-pattern coherence post-filter for speech enhancement with linear microphone arrays. In Proceedings of the 154th Audio Engineering Society Convention, Espoo, Finland, Paper 10652, May 2023.
Full text in Acris/Aaltodoc: https://urn.fi/URN:NBN:fi:aalto-202305313513
- [Publication 3]: Stefan Wirler, Nils Meyer-Kahlen and Ville Pulkki. Enhancing Spatial Post-Filters through Non-Linear Combinations. In Proceedings of the 157th Audio Engineering Society Convention, New York, United States of America, Paper 10182, September 2024.
-
[Publication 4]: Stefan Wirler and Ville Pulkki. Spatially Selective Sound Capture Based on Aggregated Pairwise Similarity Measures. The Journal of the Audio Engineering Society (JAES), 73 (11), 747–759, November 2025.
Full text in Acris/Aaltodoc: https://urn.fi/URN:NBN:fi:aalto-202512299643DOI: 10.17743/jaes.2022.0228 View at publisher