Parametric spatial audio processing utilising compact microphone arrays

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
School of Electrical Engineering | Doctoral thesis (article-based) | Defence date: 2017-11-10
Degree programme
84 + app. 78
Aalto University publication series DOCTORAL DISSERTATIONS, 197/2017
This dissertation focuses on the development of novel parametric spatial audio techniques using compact microphone arrays. Compact arrays are of special interest since they can be adapted to fit in portable devices, opening the possibility of exploiting the potential of immersive spatial audio algorithms in our daily lives. The techniques developed in this thesis consider the use of signal processing algorithms adapted for human listeners, thus exploiting the capabilities and limitations of human spatial hearing. The findings of this research are in the following three areas of spatial audio processing: directional filtering, spatial audio reproduction, and direction of arrival estimation.  In directional filtering, two novel algorithms have been developed based on the cross-pattern coherence (CroPaC). The method essentially exploits the directional response of two different types of beamformers by using their cross-spectrum to estimate a soft masker. The soft masker provides a probability-like parameter that indicates whether there is sound present in specific directions. It is then used as a post-filter to provide further suppression of directionally distributed noise at the output of a beamformer. The performance of these algorithms represent a significant improvement over previous state-of-the-art methods.  In parametric spatial audio reproduction, an algorithm is developed for multi-channel loudspeaker and headphone rendering. Current limitations in spatial audio reproduction are related to high inter-channel coherence between the channels, which is common in signal-independent systems, or time-frequency artefacts in parametric systems. The developed algorithm focuses on solving these limitations by utilising two sets of beamformers. The first set of beamformers, namely analysis beamformers, is used to estimate a set of perceptually-relevant sound-field parameters, such as the separate channel energies, inter-channel time differences and inter-channel coherences of the target-output-setup signals. The directionality of the analysis beamformers is defined so that it follows that of typical loudspeaker panning functions and, for headphone reproduction, that of the head-related transfer functions (HRTFs). The directionality of the second set of high audio quality beamformers is then enhanced with the parametric information derived from the analysis beamformers. Listening tests confirm the perceptual benefit of such type of processing. In direction of arrival (DOA) estimation, histogram analysis of beamforming and active intensity based DOA estimators has been proposed. Numerical simulations and experiments with prototype and commercial microphone arrays show that the accuracy of DOA estimation is improved.
Supervising professor
Pulkki, Ville, Prof., Aalto University, Department of Signal Processing and Acoustics, Finland
spatial audio, directional filtering, perceptual sound reproduction, microphone arrays
Other note
  • [Publication 1]: Symeon Delikaris-Manias and Ville Pulkki. Cross pattern coherence algorithm for spatial filtering applications utilizing microphone arrays. IEEE Transactions on Audio, Speech, and Language Processing, Volume 21, issue 11, pages 2356–2367, November 2013.
    DOI: 10.1109/TASL.2013.2277928 View at publisher
  • [Publication 2]: Symeon Delikaris-Manias, Juha Vilkamo, and Ville Pulkki. Signal-dependent spatial filtering based on weighted-orthogonal beamformers in the spherical harmonic domain. IEEE/ACM Transactions on Audio, Speech and Language Processing, Volume 24, issue 9, pages 1511 - 1523, April 2016.
    DOI: 10.1109/TASLP.2016.2560523 View at publisher
  • [Publication 3]: Juha Vilkamo and Symeon Delikaris-Manias. Perceptual reproduction of spatial sound using loudspeaker-signal-domain parametrization. IEEE/ACM Transactions on Audio, Speech, and Language Processing, Volume 23, issue 10, pages 1660–1669, June 2015.
    DOI: 10.1109/TASLP.2015.2443977 View at publisher
  • [Publication 4]: Symeon Delikaris-Manias, Juha Vilkamo, and Ville Pulkki. Parametric binaural rendering utilising compact microphone arrays. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia, pages 629–633, 19–24 April 2015.
    DOI: 10.1109/ICASSP.2015.7178045 View at publisher
  • [Publication 5]: Symeon Delikaris-Manias, Despoina Pavlidi, Ville Pulkki, and Athanasios Mouchtaris. 3D localization of multiple audio sources utilizing 2D DOA histograms. In 24th European Signal Processing Conference (EUSIPCO), Budapest, Hungary, pages 1473–1477, 29 August-2 September 2016.
    DOI: 10.1109/EUSIPCO.2016.7760493 View at publisher
  • [Publication 6]: Symeon Delikaris-Manias, Despoina Pavlidi, Athanasios Mouchtaris, and Ville Pulkki. DOA estimation with histogram analysis of spatially constrained intensity vectors. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, USA, pages 526-530, 5–9 March 2017.
    DOI: 10.1109/ICASSP.2017.7952211 View at publisher
  • [Publication 7]: Leo McCormack, Symeon Delikaris-Manias, and Ville Pulkki. Parametric acoustic camera for real-time sound capture, analysis and tracking. In International Conference on Digital Audio Effects (DAFx-17), Edinburgh, UK, 5–9 September 2017.