Browsing by Author "Valimaki, Vesa"
Now showing 1 - 10 of 10
Results Per Page
Sort Options
Item Audibility of Group-Delay Equalization(IEEE Advancing Technology for Humanity, 2021) Liski, Juho; Makivirta, Aki; Valimaki, Vesa; Dept Signal Process and Acoust; Audio Signal Processing; Genelec OyThis paper discusses the audibility of group-delay variations. Previous research has found limits of audibility as a function of frequency for different test signals, but extracting the tolerance for group delay to help audio reproduction system designers is hard. This study considers four critical test signals, three synthetic and one recorded, modified with digital allpass filters. The signals are filtered to produce a positive or negative group-delay peak covering the most sensitive frequency range from 500 Hz to 4 kHz, without changing the delay at other frequencies. ABX listening tests using headphones reveal the audibility thresholds for each signal. The perception is highly dependent on the signal, and the unit impulse and pink impulse are the most critical test signals. Negative group-delay variations are more easily audible than positive ones. The smallest mean threshold for the negative group delay was -0.56 ms and 0.64 ms for the positive group delay, obtained with a pink impulse. The thresholds are smaller than those obtained in previous studies. A synthetic hi-hat sound decaying 60 dB in 80 ms hides a positive group-delay variation. The variation is more difficult to hear in a recorded castanet sound than in the most critical synthetic signals. This work demonstrates how the group-delay response of headphones and loudspeakers can be perceptually tested, and leads to a better understanding of how audio systems should be equalized to avoid audible group-delay distortion.Item Augmented/Mixed Reality Audio for Hearables: Sensing, control, and rendering(IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 2022-05-01) Gupta, Rishabh; He, Jianjun; Ranjan, Rishabh; Gan, Woon Seng; Klein, Florian; Schneiderwind, Christian; Neidhardt, Annika; Brandenburg, Karlheinz; Valimaki, Vesa; Dept Signal Process and Acoust; Audio Signal Processing; Birla Institute of Technology and Science; Nanyang Technological University; Ilmenau University of Technology; Universität für Musik und darstellende Kunst GrazAugmented or mixed reality (AR/MR) is emerging as one of the key technologies in the future of computing. Audio cues are critical for maintaining a high degree of realism, social connection, and spatial awareness for various AR/MR applications, such as education and training, gaming, remote work, and virtual social gatherings to transport the user to an alternate world called the metaverse. Motivated by a wide variety of AR/MR listening experiences delivered over hearables, this article systematically reviews the integration of fundamental and advanced signal processing techniques for AR/MR audio to equip researchers and engineers in the signal processing community for the next wave of AR/MR.Item BEHM-GAN: Bandwidth Extension of Historical Music using Generative Adversarial Networks(IEEE, 2023) Moliner, Eloi; Valimaki, Vesa; Dept Signal Process and Acoust; Audio Signal ProcessingAudio bandwidth extension aims to expand the spectrum of bandlimited audio signals. Although this topic has been broadly studied during recent years, the particular problem of extending the bandwidth of historical music recordings remains an open challenge. This paper proposes a method for the bandwidth extension of historical music using generative adversarial networks (BEHM-GAN) as a practical solution to this problem. The proposed method works with the complex spectrogram representation of audio and, thanks to a dedicated regularization strategy, can effectively extend the bandwidth of out-of-distribution real historical recordings. The BEHM-GAN is designed to be applied as a second step after denoising the recording to suppress any additive disturbances, such as clicks and background noise. We train and evaluate the method using solo piano classical music. The proposed method outperforms the compared baselines in both objective and subjective experiments. The results of a formal blind listening test show that BEHM-GAN significantly increases the perceptual sound quality in early-20th-century gramophone recordings. For several items, there is a substantial improvement in the mean opinion score after enhancing historical recordings with the proposed bandwidth-extension algorithm. This study represents a relevant step toward data-driven music restoration in real-world scenarios.Item Bounded-Magnitude Discrete Fourier Transform [Tips & Tricks](IEEE, 2023-05-01) Schlecht, Sebastian J.; Valimaki, Vesa; Habets, Emanuel A.P.; Department of Art and Media; Department of Information and Communications Engineering; Audio Signal Processing; Friedrich-Alexander University Erlangen-NürnbergAnalyzing the magnitude response of a finite-length sequence is a ubiquitous task in signal processing. However, the discrete Fourier transform (DFT) provides only discrete sampling points of the response characteristic. This work introduces bounds on the magnitude response, which can be efficiently computed without additional zero padding. The proposed bounds can be used for more informative visualization and inform whether additional frequency resolution or zero padding is required.Item Decorrelation in Feedback Delay Networks(IEEE, 2023) Schlecht, Sebastian J.; Fagerström, Jon; Valimaki, Vesa; Department of Information and Communications Engineering; Department of Art and Media; Audio Signal ProcessingThe feedback delay network (FDN) is a popular filter structure to generate artificial spatial reverberation. A common requirement for multichannel late reverberation is that the output signals are well decorrelated, as too high a correlation can lead to poor reproduction of source image and uncontrolled coloration. This article presents the analysis of multichannel correlation induced by FDNs. It is shown that the correlation depends primarily on the feedforward paths, while the long reverberation tail produced by the recursive path does not contribute to the inter-channel correlation. The impact of the feedback matrix type, size, and delays on the inter-channel correlation is demonstrated. The results show that small FDNs with a few feedback channels tend to have a high inter-channel correlation, and that the use of a filter feedback matrix significantly improves the decorrelation, often leading to the lowest inter-channel correlation among the tested cases. The learnings of this work support the practical design of multichannel artificial reverberators for immersive audio applications.Item HRTF Interpolation using a Spherical Neural Process Meta-Learner(IEEE, 2024) Thuillier, Etienne; Jin, Craig; Valimaki, Vesa; Department of Information and Communications Engineering; Audio Signal Processing; University of SydneySeveral individualization methods have recently been proposed to estimate a subject's Head-Related Transfer Function (HRTF) using convenient input modalities such as anthropometric measurements or pinnae photographs. There exists a need for adaptively correcting the estimation error committed by such methods using a few data point samples from the subject's HRTF, acquired using acoustic measurements or perceptual feedback. To facilitate this, we introduce a Convolutional Conditional Neural Process meta-learner specialized in HRTF error interpolation. In particular, the model includes a Spherical Convolutional Neural Network component to accommodate the spherical geometry of HRTF data. It also exploits potential symmetries between the HRTF's left and right channels about the median plane. In this work, we evaluate the proposed model's performance purely on time-aligned spectrum interpolation grounds under a simplified setup where a generic population-mean HRTF forms the initial estimates prior to corrections instead of individualized ones. The trained model achieves up to 3 dB relative error reduction compared to state-of-the-art interpolation methods despite being trained using only 85 subjects. This improvement translates up to nearly a halving of the data point count required to achieve comparable accuracy, in particular from 50 to 28 points to reach an average of -20 dB relative error per interpolated feature. Moreover, we show that the trained model provides well-calibrated uncertainty estimates. Accordingly, such estimates could inform the sequential decision problem of acquiring as few correcting HRTF data points as needed to meet a desired level of HRTF individualization accuracy.Item Late-Reverberation Synthesis using Interleaved Velvet-Noise Sequences(IEEE Advancing Technology for Humanity, 2021) Valimaki, Vesa; Prawda, Karolina; Dept Signal Process and Acoust; Audio Signal ProcessingThis paper proposes a novel algorithm for simulating the late part of room reverberation. A well-known fact is that a room impulse response sounds similar to exponentially decaying filtered noise some time after the beginning. The algorithm proposed here employs several velvet-noise sequences in parallel and combines them so that their non-zero samples never occur at the same time. Each velvet-noise sequence is driven by the same input signal but is filtered with its own feedback filter which has the same delay-line length as the velvet-noise sequence. The resulting response is sparse and consists of filtered noise that decays approximately exponentially with a given frequency-dependent reverberation time profile. We show via a formal listening test that four interleaved branches are sufficient to produce a smooth high-quality response. The outputs of the branches connected in different combinations produce decorrelated output signals for multichannel reproduction. The proposed method is compared with a state-of-the-art delay-based reverberation method and its advantages are pointed out. The computational load of the method is 60% smaller than that of a comparable existing method, the feedback delay network. The proposed method is well suited to the synthesis of diffuse late reverberation in audio and music production.Item Modal Excitation in Feedback Delay Networks(IEEE, 2024-09-23) Schlecht, Sebastian J.; Scerbo, Matteo; Sena, Enzo De; Valimaki, Vesa; Department of Information and Communications Engineering; Department of Art and Media; Virtual Acoustics; Audio Signal Processing; University of Erlangen–NurembergFeedback delay networks (FDNs) are used in audio processing and synthesis. The modal shapes of the system describe the modal excitation by input and output signals. Previously, the Ehrlich-Aberth method was used to find modes in large FDNs. Here, the method is extended to the corresponding eigenvectors indicating the modal shape. In particular, the computational complexity of the proposed analysis method does not depend on the delay-line lengths and is thus suitable for large FDNs, such as artificial reverberators. We show the relation between the compact generalized eigenvectors in the delay state space and the spatially extended modal shapes in the state space. We illustrate this method with an example FDN in which the suggested modal excitation control does not increase the computational cost. The modal shapes can help optimize input and output gains. This letter teaches how selecting the input and output points along the delay lines of an FDN adjusts the spectral shape of the system output.Item Noise Morphing for Audio Time Stretching(IEEE, 2024) Moliner Juanpere, Eloi; Fierro, Leonardo; Wright, Alec; Hamalainen, Matti S.; Valimaki, Vesa; Department of Information and Communications Engineering; Audio Signal Processing; NokiaThis letter introduces an innovative method to enhance the quality of audio time stretching by precisely decomposing a sound into sines, transients, and noise and by improving the processing of the latter component. While there are established methods for time-stretching sines and transients with high quality, the manipulation of noise or residual components has lacked robust solutions in prior research. The proposed method combines sound decomposition with previous techniques for audio spectral resynthesis. The time-stretched noise component is achieved by morphing its time-interpolated spectral magnitude with a white-noise excitation signal. This method stands out for its simplicity, efficiency, and audio quality. The results of a subjective experiment affirm the superiority of this approach over current state-of-the-art methods across all evaluated stretch factors. The proposed technique notably excels in extreme stretching scenarios, signifying a substantial elevation in performance. The proposed method holds promise for a wide range of applications in slow-motion media content, such as music or sports video production.Item Two-Stage Attenuation Filter for Artificial Reverberation(IEEE, 2024-01-10) Valimaki, Vesa; Prawda, Karolina; Schlecht, Sebastian J.; Department of Information and Communications Engineering; Department of Art and Media; Audio Signal ProcessingDelay networks are a common parametric method to synthesize the late part of the room reverberation. A delay network consists of several feedback loops, each containing a delay line and an attenuation filter, which approximates the same decay rate by appropriately setting the frequency-dependent loop gain. A remaining challenge is the design of the attenuation filters on a wide frequency range based on a measured room impulse response. This letter proposes a novel two-stage attenuation filter structure, sharpening the design. The first stage is a low-order pre-filter approximating the overall shape and determining the decay at the two ends of the frequency range, namely at the dc and the Nyquist limit. The second filter, an equalizer, fine-tunes the gain at different frequencies, such as on one-third-octave bands. It is shown that the proposed design is more accurate and robust than previous methods. A design example applying the proposed method to an interleaved velvet-noise reverberator is also exhibited. The proposed two-stage attenuation filter is a step toward a realistic parametric simulation of measured room impulse responses.