A Parametric Spatial Audio Compression Codec for Higher-Order Ambisonics
dc.contributor | Aalto-yliopisto | fi |
dc.contributor | Aalto University | en |
dc.contributor.advisor | Politis, Archontis, Prof., Tampere University, Finland | |
dc.contributor.advisor | Pulkki, Ville, Prof., Aalto University, Department of Information and Communications Engineering, Finland | |
dc.contributor.author | Hold, Christoph | |
dc.contributor.department | Informaatio- ja tietoliikennetekniikan laitos | fi |
dc.contributor.department | Department of Information and Communications Engineering | en |
dc.contributor.lab | Communication Acoustics | en |
dc.contributor.school | Sähkötekniikan korkeakoulu | fi |
dc.contributor.school | School of Electrical Engineering | en |
dc.contributor.supervisor | Pulkki, Ville, Prof., Aalto University, Department of Information and Communications Engineering, Finland | |
dc.date.accessioned | 2024-10-22T09:00:35Z | |
dc.date.available | 2024-10-22T09:00:35Z | |
dc.date.defence | 2024-11-01 | |
dc.date.issued | 2024 | |
dc.description.abstract | Spatial audio has the potential to revolutionize how we consume music and other audio content by enabling an immersive audio experience. Therefore, the technologyand entertainment industry recently adapted their services and began delivering spatial audio formats. Higher-order Ambisonics (HOA), representing the audio scene in the spherical harmonic domain (SHD), offers various benefits as a spatial audio format, notably the independence of the recording and reproduction setup. However, a critical challenge remains: high-quality spatial audio content is largely inaccessible due to the required number of audio channels and data. Audio codecs can successfully reduce the technical challenges originating from distribution and storage. Despite the demand for high channel-count spatial audio continuing to rise, traditional multichannel codecs fall short of delivering the required performance for HOA. Akin to parametric audio coding, model-based parametric spatial audio techniques can be adapted for perceptual spatial audio coding. Model-based spatial audio techniques may parameterize the input scene in a perceptually meaningful and compact way. The input scene parameterization allows signal-dependent processing such as directional optimizations and informed upmixing, overcoming typical challenges of signal-independent processing. This work proposes a spatial audio codec for HOA using parametric Directional Audio Coding (DirAC). First, a modified spherical harmonic transform strategy is developed that enables analysis, modification, and reconstruction of HOA signals. The following study explores a compression strategy achieving perfect reconstruction of low-order SHD components and parameterized resynthesis of higher-order SHD components, establishing the perceptual effectiveness of this duality. Furthermore, SHD post-processing is derived that leverages the input parameterization to improve the codec output by matching to target signal properties. Finally, this work introduces a HOA audio codec based on the aforementioned theoretical foundations. The experimental results demonstrate significant improvements over traditional multi-channel audio codecs, highlighting the potential of the proposed codec to deliver high-quality spatial audio, advocating for including input parameterization side-information in order to avoid coding excessive channel-counts. The implemented codec achieves excellent perceptual quality ratings while reducing the transport data to only a few percent of the input audio data.In conclusion, this research advances the state of the art in spatial audio coding and yields further development in spatial audio codecs for delivering HOA, making the HOA format and its benefits more accessible, thus enabling wider adoption in various media applications. | en |
dc.format.extent | 88 + app. 42 | |
dc.format.mimetype | application/pdf | en |
dc.identifier.isbn | 978-952-64-2079-0 (electronic) | |
dc.identifier.isbn | 978-952-64-2078-3 (printed) | |
dc.identifier.issn | 1799-4942 (electronic) | |
dc.identifier.issn | 1799-4934 (printed) | |
dc.identifier.issn | 1799-4934 (ISSN-L) | |
dc.identifier.uri | https://aaltodoc.aalto.fi/handle/123456789/131296 | |
dc.identifier.urn | URN:ISBN:978-952-64-2079-0 | |
dc.language.iso | en | en |
dc.opn | Skoglund, Jan, Dr., Google LLC, USA | |
dc.publisher | Aalto University | en |
dc.publisher | Aalto-yliopisto | fi |
dc.relation.haspart | [Publication 1]: Christoph Hold, Sebastian J. Schlecht, Archontis Politis, Ville Pulkki. Spatial Filter Bank in the Spherical Harmonic Domain: Reconstruction and Application. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct 2021. Full text in Acris/Aaltodoc: https://urn.fi/URN:NBN:fi:aalto-202201261488. DOI: 10.1109/WASPAA52581.2021.9632709 | |
dc.relation.haspart | [Publication 2]: Christoph Hold, Ville Pulkki, Archontis Politis, Leo McCormack. Compression of Higher-Order Ambisonic Signals Using Directional Audio Coding. IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 651-665, 2024. Full text in Acris/Aaltodoc: https://urn.fi/URN:NBN:fi:aalto-202401041155. DOI: 10.1109/TASLP.2023.3328284 | |
dc.relation.haspart | [Publication 3]: Christoph Hold, Leo McCormack, Archontis Politis, Ville Pulkki. Optimizing Higher-Order Directional Audio Coding with Adaptive Mixing and Energy Matching for Ambisonic Compression and Upmixing. In IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), Oct 2023. Full text in Acris/Aaltodoc: https://urn.fi/URN:NBN:fi:aalto-202310186520. DOI: 10.1109/WASPAA58266.2023.10248179 | |
dc.relation.haspart | [Publication 4]: Christoph Hold, Leo McCormack, Archontis Politis, Ville Pulkki. Perceptually-Motivated Spatial Audio Codec for Higher-Order Ambisonics Compression. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2024. Full text in Acris/Aaltodoc: https://urn.fi/URN:NBN:fi:aalto-202410096659. DOI: 10.1109/ICASSP48485.2024.10447577 | |
dc.relation.ispartofseries | Aalto University publication series DOCTORAL THESES | en |
dc.relation.ispartofseries | 220/2024 | |
dc.rev | Peters, Nils, Prof., Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany | |
dc.rev | Schultz, Frank, Dr., University of Rostock, Germany | |
dc.subject.keyword | spatial audio | en |
dc.subject.keyword | audio coding | en |
dc.subject.other | Communication | en |
dc.subject.other | Computer science | en |
dc.title | A Parametric Spatial Audio Compression Codec for Higher-Order Ambisonics | en |
dc.type | G5 Artikkeliväitöskirja | fi |
dc.type.dcmitype | text | en |
dc.type.ontasot | Doctoral dissertation (article-based) | en |
dc.type.ontasot | Väitöskirja (artikkeli) | fi |
local.aalto.acrisexportstatus | checked 2024-11-01_0911 | |
local.aalto.archive | yes | |
local.aalto.formfolder | 2024_10_22_klo_10_25 | |
local.aalto.infra | Aalto Acoustics Lab |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- isbn9789526420790.pdf
- Size:
- 2.82 MB
- Format:
- Adobe Portable Document Format