A Parametric Spatial Audio Compression Codec for Higher-Order Ambisonics

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.advisorPolitis, Archontis, Prof., Tampere University, Finland
dc.contributor.advisorPulkki, Ville, Prof., Aalto University, Department of Information and Communications Engineering, Finland
dc.contributor.authorHold, Christoph
dc.contributor.departmentInformaatio- ja tietoliikennetekniikan laitosfi
dc.contributor.departmentDepartment of Information and Communications Engineeringen
dc.contributor.labCommunication Acousticsen
dc.contributor.schoolSähkötekniikan korkeakoulufi
dc.contributor.schoolSchool of Electrical Engineeringen
dc.contributor.supervisorPulkki, Ville, Prof., Aalto University, Department of Information and Communications Engineering, Finland
dc.date.accessioned2024-10-22T09:00:35Z
dc.date.available2024-10-22T09:00:35Z
dc.date.defence2024-11-01
dc.date.issued2024
dc.description.abstractSpatial audio has the potential to revolutionize how we consume music and other audio content by enabling an immersive audio experience. Therefore, the technologyand entertainment industry recently adapted their services and began delivering spatial audio formats. Higher-order Ambisonics (HOA), representing the audio scene in the spherical harmonic domain (SHD), offers various benefits as a spatial audio format, notably the independence of the recording and reproduction setup. However, a critical challenge remains: high-quality spatial audio content is largely inaccessible due to the required number of audio channels and data. Audio codecs can successfully reduce the technical challenges originating from distribution and storage. Despite the demand for high channel-count spatial audio continuing to rise, traditional multichannel codecs fall short of delivering the required performance for HOA. Akin to parametric audio coding, model-based parametric spatial audio techniques can be adapted for perceptual spatial audio coding. Model-based spatial audio techniques may parameterize the input scene in a perceptually meaningful and compact way. The input scene parameterization allows signal-dependent processing such as directional optimizations and informed upmixing, overcoming typical challenges of signal-independent processing. This work proposes a spatial audio codec for HOA using parametric Directional Audio Coding (DirAC). First, a modified spherical harmonic transform strategy is developed that enables analysis, modification, and reconstruction of HOA signals. The following study explores a compression strategy achieving perfect reconstruction of low-order SHD components and parameterized resynthesis of higher-order SHD components, establishing the perceptual effectiveness of this duality. Furthermore, SHD post-processing is derived that leverages the input parameterization to improve the codec output by matching to target signal properties. Finally, this work introduces a HOA audio codec based on the aforementioned theoretical foundations. The experimental results demonstrate significant improvements over traditional multi-channel audio codecs, highlighting the potential of the proposed codec to deliver high-quality spatial audio, advocating for including input parameterization side-information in order to avoid coding excessive channel-counts. The implemented codec achieves excellent perceptual quality ratings while reducing the transport data to only a few percent of the input audio data.In conclusion, this research advances the state of the art in spatial audio coding and yields further development in spatial audio codecs for delivering HOA, making the HOA format and its benefits more accessible, thus enabling wider adoption in various media applications.en
dc.format.extent88 + app. 42
dc.format.mimetypeapplication/pdfen
dc.identifier.isbn978-952-64-2079-0 (electronic)
dc.identifier.isbn978-952-64-2078-3 (printed)
dc.identifier.issn1799-4942 (electronic)
dc.identifier.issn1799-4934 (printed)
dc.identifier.issn1799-4934 (ISSN-L)
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/131296
dc.identifier.urnURN:ISBN:978-952-64-2079-0
dc.language.isoenen
dc.opnSkoglund, Jan, Dr., Google LLC, USA
dc.publisherAalto Universityen
dc.publisherAalto-yliopistofi
dc.relation.haspart[Publication 1]: Christoph Hold, Sebastian J. Schlecht, Archontis Politis, Ville Pulkki. Spatial Filter Bank in the Spherical Harmonic Domain: Reconstruction and Application. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct 2021. Full text in Acris/Aaltodoc: https://urn.fi/URN:NBN:fi:aalto-202201261488. DOI: 10.1109/WASPAA52581.2021.9632709
dc.relation.haspart[Publication 2]: Christoph Hold, Ville Pulkki, Archontis Politis, Leo McCormack. Compression of Higher-Order Ambisonic Signals Using Directional Audio Coding. IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 651-665, 2024. Full text in Acris/Aaltodoc: https://urn.fi/URN:NBN:fi:aalto-202401041155. DOI: 10.1109/TASLP.2023.3328284
dc.relation.haspart[Publication 3]: Christoph Hold, Leo McCormack, Archontis Politis, Ville Pulkki. Optimizing Higher-Order Directional Audio Coding with Adaptive Mixing and Energy Matching for Ambisonic Compression and Upmixing. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct 2023. Full text in Acris/Aaltodoc: https://urn.fi/URN:NBN:fi:aalto-202310186520. DOI: 10.1109/WASPAA58266.2023.10248179
dc.relation.haspart[Publication 4]: Christoph Hold, Leo McCormack, Archontis Politis, Ville Pulkki. Perceptually-Motivated Spatial Audio Codec for Higher-Order Ambisonics Compression. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2024. Full text in Acris/Aaltodoc: https://urn.fi/URN:NBN:fi:aalto-202410096659. DOI: 10.1109/ICASSP48485.2024.10447577
dc.relation.ispartofseriesAalto University publication series DOCTORAL THESESen
dc.relation.ispartofseries220/2024
dc.revPeters, Nils, Prof., Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany
dc.revSchultz, Frank, Dr., University of Rostock, Germany
dc.subject.keywordspatial audioen
dc.subject.keywordaudio codingen
dc.subject.otherCommunicationen
dc.subject.otherComputer scienceen
dc.titleA Parametric Spatial Audio Compression Codec for Higher-Order Ambisonicsen
dc.typeG5 Artikkeliväitöskirjafi
dc.type.dcmitypetexten
dc.type.ontasotDoctoral dissertation (article-based)en
dc.type.ontasotVäitöskirja (artikkeli)fi
local.aalto.acrisexportstatuschecked 2024-11-01_0911
local.aalto.archiveyes
local.aalto.formfolder2024_10_22_klo_10_25
local.aalto.infraAalto Acoustics Lab

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
isbn9789526420790.pdf
Size:
2.82 MB
Format:
Adobe Portable Document Format