Audio Decomposition for Time Stretching

Loading...
Thumbnail Image
Journal Title
Journal ISSN
Volume Title
School of Electrical Engineering | Doctoral thesis (article-based) | Defence date: 2024-05-24
Date
2024
Major/Subject
Mcode
Degree programme
Language
en
Pages
59 + app. 51
Series
Aalto University publication series DOCTORAL THESES, 114/2024
Abstract
Time-scale modification is a common audio signal processing task that involves changing the duration of a sound without altering its frequency content. This thesis explores transients and noise sounds in the context of audio processing and investigates the use of sound decomposition to improve the quality of time scaling for normal and extreme stretching factors. Traditionally, time-stretching methods often introduce artifacts, such as phasiness and transient smearing, especially when the stretching factor is large. To address the issue, this thesis introduced an improved method to decompose sounds into their constituent sine, transient, and noise components, and a different processing technique can be separately applied to each individual class. This allows for better preservation of transient features, even at extreme stretching factors, and improves the perceived quality of time-stretched audio signals compared to traditional methods. This thesis also presents an alternative audio-visual evaluation method for audio decomposition using an interactive audio player application, where access to the individual sinusoidal, transient, and noise classes is granted through a graphical user interface. This application aims at covering the shortcomings of misused objective metrics and promotes experimenting with the sound decomposition process by observing the effect of variations for each spectral component on the original sound and by comparing different methods against each other, evaluating the separation quality both audibly and visually. This thesis also discusses the motivation behind the use of the sines-transient-noise decomposition for time stretching by analyzing the performance drop in a well-known time-scale modification method due to incorrect transient and noise handling. This work shows that, by adopting the proposed three-way decomposition within its framework, the quality of the timestretching performance of such a method is increased. The noise component is typically overlooked by conventional time-scale modification methods. This thesis introduces a novel hybrid design using a deep learning model to generate the stretched noise component with high quality even for extreme stretching factors, when the sound is slowed down by more than four times as it happens for slow motion sport videos or synthesis of ambient music. Finally, a simple and effective solution named noise morphing is described, producing state-of-the-art results across a wide range of audio inputs and stretching factors.
Description
Supervising professor
Välimäki, Vesa, Prof., Aalto University, Department of Information and Communications Engineering, Finland
Thesis advisor
Välimäki, Vesa, Prof., Aalto University, Department of Information and Communications Engineering, Finland
Keywords
audio Effects, audio signal processing, transient analysis, noise, time-scale modification, time-frequency analysis
Other note
Parts
  • [Publication 1]: Leonardo Fierro, Vesa Valimaki. Enhanced Fuzzy Decomposition of Sound Into Sines, Transients, and Noise. Journal of the Audio Engineering Society, Vol. 71, 7/8, pp. 468-480, July 2023.
    DOI: 10.17743/jaes.2022.0077 View at publisher
  • [Publication 2]: Leonardo Fierro, Vesa Valimaki. SiTraNo: A MATLAB App for Sines-Transients-Noise Decomposition of Audio Signals. In Proceedings of the International Conference on Digital Audio Effects (DAFx 2021), Vienna, Austria, September 2021.
  • [Publication 3]: Leonardo Fierro, Vesa Valimaki. Towards Objective Evaluation of Audio Time Stretching Methods. In Proceedings of the Sound and Music Computing Conference (SMC 2020), Torino, Italy, May 2020.
  • [Publication 4]: Leonardo Fierro, Alec Wright, Vesa Valimaki, Matti Hamalainen. Extreme Audio Time Stretching Using Neural Synthesis. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes, Greece, June 2023.
    DOI: 10.1109/ICASSP49357.2023.10094738 View at publisher
  • [Publication 5]: Eloi Moliner, Leonardo Fierro, Alec Wright, Vesa Valimaki, Matti S. Hamalainen. Noise Morphing for Audio Time Stretching. IEEE Signal Processing Letters, Early Access, April 2024.
    DOI: 10.1109/LSP.2024.3386118 View at publisher
Citation