Solving Audio Inverse Problems with a Diffusion Model
Loading...
Access rights
openAccess
acceptedVersion
URL
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
Date
Major/Subject
Mcode
Degree programme
Language
en
Pages
5
Series
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
Abstract
This paper presents CQT-Diff, a data-driven generative audio model that can, once trained, be used for solving various different audio inverse problems in a problem-agnostic setting. CQT-Diff is a neural diffusion model with an architecture that is carefully constructed to exploit pitch-equivariant symmetries in music. This is achieved by preconditioning the model with an invertible Constant-Q Transform (CQT), whose logarithmically-spaced frequency axis represents pitch equivariance as translation equivariance. The proposed method is evaluated with solo piano music, using objective and subjective metrics in three different and varied tasks: audio bandwidth extension, inpainting, and declipping. The results show that CQT-Diff outperforms the compared baselines and ablations in audio bandwidth extension and, without retraining, delivers competitive performance against modern baselines in audio inpainting and declipping. This work represents the first diffusion-based general framework for solving inverse problems in audio processing.Description
Other note
Citation
Moliner Juanpere, E, Lehtinen, J & Välimäki, V 2023, Solving Audio Inverse Problems with a Diffusion Model. in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)., 10095637, Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, pp. 1-5, IEEE International Conference on Acoustics, Speech, and Signal Processing, Rhodes Island, Greece, 04/06/2023. https://doi.org/10.1109/ICASSP49357.2023.10095637