Parametric reproduction of microphone array recordings

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
School of Electrical Engineering | Doctoral thesis (article-based) | Defence date: 2023-05-11
Degree programme
44 + app. 80
Aalto University publication series DOCTORAL THESES, 60/2023
This thesis encloses five publications which describe technologies for recording, analysing, manipulating, and reproducing spatial sound scenes, which confront many of the challenges associated with the development of systems capable of delivering high quality audio within virtual reality and augmented hearing contexts. The technologies detailed herein operate based upon microphone array signals, which have been transformed into the time-frequency domain. Through the adoption of an assumed sound-field model, an input sound scene may be parameterised and decomposed, which permits the optional manipulation and subsequent reproduction of the sound scene over an arbitrary playback setup. This type of processing often leads to a high degree of playback flexibility and perceived spatial accuracy, which would otherwise be unattainable when using signal-independent and non-parametric alternatives. The first contribution of this thesis concerns the parameterisation and rendering of microphone array room impulse responses, such that the spatial characteristics of a measured space may be imparted onto a monophonic input signal and reproduced over a target loudspeaker setup. The second contribution explores a parametric method for converting microphone array signals into the popular Ambisonics format, while placing specific emphasis on the use of microphone arrays that are mounted onto irregular/non-spherical geometries; such as head-worn devices, which may find application within future augmented reality contexts. The third contribution also concerns a head-worn microphone array, but instead utilised microphones that are sensitive to ultrasonic frequencies. The intention is for ultrasonic sound sources to be captured by the array and then down pitch-shifted to the audible range, while being spatialised in the same direction that the sound arrived from. A number of spatial audio effects and sound-field modification tools were then explored in the fourth contribution, which operate based upon Ambisonic signals as input and involve the use of a parametric rendering framework. The final contribution concerns the use of a distributed arrangement of multiple Ambisonic receivers, which may be used to capture the sound scene from multiple perspectives. Subsequent analysis and decomposition of the sound scene, into its individual components, enables reproduction at different positions; thus, allowing a listener to navigate through the recorded sound scene.
Supervising professor
Pulkki, Ville, Prof., Aalto University, Department of Information and Communications Engineering, Finland
Thesis advisor
Politis, Archontis, Prof., Tampere University, Finland
Pulkki, Ville, Prof., Aalto University, Finland
spatial audio, array signal processing
Other note
  • [Publication 1]: Leo McCormack, Ville Pulkki, Archontis Politis, Oliver Scheuregger and Marton Marschall. Higher-order spatial impulse response rendering: Investigating the perceived effects of spherical order, dedicated diffuse rendering, and frequency resolution. Journal of the Audio Engineering Society (JAES), vol. 68, no. 5, pp. 338–354, May 2020.
    DOI: 10.17743/jaes.2020.0026 View at publisher
  • [Publication 2]: Leo McCormack, Archontis Politis, Raimundo Gonzalez, Tapio Lokki and Ville Pulkki. Parametric Ambisonic Encoding of Arbitrary Microphone Arrays. IEEE Transactions on Audio, Speech and Language Processing, vol. 30, June 2022.
    Full text in Acris/Aaltodoc:
    DOI: 10.1109/TASLP.2022.3182857 View at publisher
  • [Publication 3]: Ville Pulkki, Leo McCormack and Raimundo Gonzalez. Superhuman spatial hearing technology for ultrasonic frequencies. Scientific Reports, 11, 11608, June 2021.
    Full text in Acris/Aaltodoc:
    DOI: 10.1038/s41598-021-90829-9 View at publisher
  • [Publication 4]: Leo McCormack, Archontis Politis and Ville Pulkki. Parametric Spatial Audio Effects Based on the Multi-Directional Decomposition of Ambisonic Sound Scenes. In Proceedings of the 24th International Conference on Digital Audio Effects (DAFx20in21), September 2021.
  • [Publication 5]: Leo McCormack, Archontis Politis, Thomas McKenzie, Christoph Hold and Ville Pulkki. Object-Based Six-Degrees-of-Freedom Rendering of Sound Scenes Captured with Multiple Ambisonic Receivers. Journal of the Audio Engineering Society (JAES), vol. 70, no. 5, pp. 355-372, May 2022.
    Full text in Acris/Aaltodoc:
    DOI: 10.17743/JAES.2022.0010 View at publisher