Parametric reproduction of microphone array recordings

Loading...
Thumbnail Image
Journal Title
Journal ISSN
Volume Title
School of Electrical Engineering | Doctoral thesis (article-based) | Defence date: 2023-05-11
Date
2023
Major/Subject
Mcode
Degree programme
Language
en
Pages
44 + app. 80
Series
Aalto University publication series DOCTORAL THESES, 60/2023
Abstract
This thesis encloses five publications which describe technologies for recording, analysing, manipulating, and reproducing spatial sound scenes, which confront many of the challenges associated with the development of systems capable of delivering high quality audio within virtual reality and augmented hearing contexts. The technologies detailed herein operate based upon microphone array signals, which have been transformed into the time-frequency domain. Through the adoption of an assumed sound-field model, an input sound scene may be parameterised and decomposed, which permits the optional manipulation and subsequent reproduction of the sound scene over an arbitrary playback setup. This type of processing often leads to a high degree of playback flexibility and perceived spatial accuracy, which would otherwise be unattainable when using signal-independent and non-parametric alternatives. The first contribution of this thesis concerns the parameterisation and rendering of microphone array room impulse responses, such that the spatial characteristics of a measured space may be imparted onto a monophonic input signal and reproduced over a target loudspeaker setup. The second contribution explores a parametric method for converting microphone array signals into the popular Ambisonics format, while placing specific emphasis on the use of microphone arrays that are mounted onto irregular/non-spherical geometries; such as head-worn devices, which may find application within future augmented reality contexts. The third contribution also concerns a head-worn microphone array, but instead utilised microphones that are sensitive to ultrasonic frequencies. The intention is for ultrasonic sound sources to be captured by the array and then down pitch-shifted to the audible range, while being spatialised in the same direction that the sound arrived from. A number of spatial audio effects and sound-field modification tools were then explored in the fourth contribution, which operate based upon Ambisonic signals as input and involve the use of a parametric rendering framework. The final contribution concerns the use of a distributed arrangement of multiple Ambisonic receivers, which may be used to capture the sound scene from multiple perspectives. Subsequent analysis and decomposition of the sound scene, into its individual components, enables reproduction at different positions; thus, allowing a listener to navigate through the recorded sound scene.
Description
Supervising professor
Pulkki, Ville, Prof., Aalto University, Department of Information and Communications Engineering, Finland
Thesis advisor
Politis, Archontis, Prof., Tampere University, Finland
Pulkki, Ville, Prof., Aalto University, Finland
Keywords
spatial audio, array signal processing
Other note
Parts
  • [Publication 1]: Leo McCormack, Ville Pulkki, Archontis Politis, Oliver Scheuregger and Marton Marschall. Higher-order spatial impulse response rendering: Investigating the perceived effects of spherical order, dedicated diffuse rendering, and frequency resolution. Journal of the Audio Engineering Society (JAES), vol. 68, no. 5, pp. 338–354, May 2020.
    DOI: 10.17743/jaes.2020.0026 View at publisher
  • [Publication 2]: Leo McCormack, Archontis Politis, Raimundo Gonzalez, Tapio Lokki and Ville Pulkki. Parametric Ambisonic Encoding of Arbitrary Microphone Arrays. IEEE Transactions on Audio, Speech and Language Processing, vol. 30, June 2022.
    Full text in Acris/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-202208104642
    DOI: 10.1109/TASLP.2022.3182857 View at publisher
  • [Publication 3]: Ville Pulkki, Leo McCormack and Raimundo Gonzalez. Superhuman spatial hearing technology for ultrasonic frequencies. Scientific Reports, 11, 11608, June 2021.
    Full text in Acris/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-202106167396
    DOI: 10.1038/s41598-021-90829-9 View at publisher
  • [Publication 4]: Leo McCormack, Archontis Politis and Ville Pulkki. Parametric Spatial Audio Effects Based on the Multi-Directional Decomposition of Ambisonic Sound Scenes. In Proceedings of the 24th International Conference on Digital Audio Effects (DAFx20in21), September 2021.
  • [Publication 5]: Leo McCormack, Archontis Politis, Thomas McKenzie, Christoph Hold and Ville Pulkki. Object-Based Six-Degrees-of-Freedom Rendering of Sound Scenes Captured with Multiple Ambisonic Receivers. Journal of the Audio Engineering Society (JAES), vol. 70, no. 5, pp. 355-372, May 2022.
    Full text in Acris/Aaltodoc: http://urn.fi/URN:NBN:fi:aalto-202206224151
    DOI: 10.17743/JAES.2022.0010 View at publisher
Citation