Browsing by Author "Välimäki, Vesa, Prof., Aalto University, Department of Information and Communications Engineering, Finland"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Audio Decomposition for Time Stretching(Aalto University, 2024) Fierro, Leonardo; Välimäki, Vesa, Prof., Aalto University, Department of Information and Communications Engineering, Finland; Informaatio- ja tietoliikennetekniikan laitos; Department of Information and Communications Engineering; Aalto Acoustics Lab, Audio Signal Processing group; Sähkötekniikan korkeakoulu; School of Electrical Engineering; Välimäki, Vesa, Prof., Aalto University, Department of Information and Communications Engineering, FinlandTime-scale modification is a common audio signal processing task that involves changing the duration of a sound without altering its frequency content. This thesis explores transients and noise sounds in the context of audio processing and investigates the use of sound decomposition to improve the quality of time scaling for normal and extreme stretching factors. Traditionally, time-stretching methods often introduce artifacts, such as phasiness and transient smearing, especially when the stretching factor is large. To address the issue, this thesis introduced an improved method to decompose sounds into their constituent sine, transient, and noise components, and a different processing technique can be separately applied to each individual class. This allows for better preservation of transient features, even at extreme stretching factors, and improves the perceived quality of time-stretched audio signals compared to traditional methods. This thesis also presents an alternative audio-visual evaluation method for audio decomposition using an interactive audio player application, where access to the individual sinusoidal, transient, and noise classes is granted through a graphical user interface. This application aims at covering the shortcomings of misused objective metrics and promotes experimenting with the sound decomposition process by observing the effect of variations for each spectral component on the original sound and by comparing different methods against each other, evaluating the separation quality both audibly and visually. This thesis also discusses the motivation behind the use of the sines-transient-noise decomposition for time stretching by analyzing the performance drop in a well-known time-scale modification method due to incorrect transient and noise handling. This work shows that, by adopting the proposed three-way decomposition within its framework, the quality of the timestretching performance of such a method is increased. The noise component is typically overlooked by conventional time-scale modification methods. This thesis introduces a novel hybrid design using a deep learning model to generate the stretched noise component with high quality even for extreme stretching factors, when the sound is slowed down by more than four times as it happens for slow motion sport videos or synthesis of ambient music. Finally, a simple and effective solution named noise morphing is described, producing state-of-the-art results across a wide range of audio inputs and stretching factors.Item Neural Modelling of Audio Effects(Aalto University, 2023) Wright, Alec; Välimäki, Vesa, Prof., Aalto University, Department of Information and Communications Engineering, Finland; Informaatio- ja tietoliikennetekniikan laitos; Department of Information and Communications Engineering; Audio Signal Processing; Sähkötekniikan korkeakoulu; School of Electrical Engineering; Välimäki, Vesa, Prof., Aalto University, Department of Information and Communications Engineering, FinlandNeural networks and other machine learning based approaches to audio effects processing have become increasingly popular in recent years. This thesis focuses on the design and training of neural network architectures for the emulation of specific analog audio devices from data. The digital emulation of analog audio devices is commonly known as virtual analog, and popular effects processing devices for virtual analog modelling include guitar amplifiers, distortion pedals, time-varying effects, and compressors. Whilst analytical methods based on circuit analysis are capable of producing realistic, efficient and accurate models of devices, these approaches are limited by the fact that creating a model of a specific device is time-consuming and requires expert knowledge. In contrast, neural network based methods allow for greater automation in the modelling process, and can be applied relatively easily to a range of devices as long as sufficient data is available. This thesis proposes a number of neural network based methods for audio effects modelling, and shows that they achieve excellent perceptual emulation quality. The proposed models include convolutional, recurrent and differentiable digital signal processing based architectures. There is a focus on models with low computational cost and low latency, such that they are suitable for real-time processing as part of a music production workflow. Methods for modelling Low-Frequency Oscillator (LFO) modulated time-varying effects, compressors, guitar amplifiers and distortions pedals are proposed. In addition to the neural network architectures themselves, this thesis also provides practical details and methods for training the models. This includes the proposal and validation of a novel perceptually motivated pre-emphasis filter, used to model non-linear audio effects processing. Additionally a pruning method is applied and shown to achieve significant reduction in model size and inference cost for guitar amplifier and distortion effects modelling. Finally, this thesis presents a novel method for the task of modelling non-linear audio effects processing when paired training data is unavailable. This allows for complex non-linear effects processing to be emulated from recordings, whilst requiring no knowledge of the specific devices used to create the recording.