Browsing by Author "Elvander, Filip"
Now showing 1 - 20 of 23
- Results Per Page
- Sort Options
- Blind Audio Bandwidth Extension: A Diffusion-Based Zero-Shot Approach
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2024-11-27) Moliner Juanpere, Eloi; Elvander, Filip; Välimäki, VesaAudio bandwidth extension involves the realistic reconstruction of high-frequency spectra from bandlimited observations. In cases where the lowpass degradation is unknown, such as in restoring historical audio recordings, this becomes a blind problem. This paper introduces a novel method called BABE (Blind Audio Bandwidth Extension) that addresses the blind problem in a zero-shot setting, leveraging the generative priors of a pre-trained unconditional diffusion model. During the inference process, BABE utilizes a generalized version of diffusion posterior sampling, where the degradation operator is unknown but parametrized and inferred iteratively. The performance of the proposed method is evaluated using objective and subjective metrics, and the results show that BABE surpasses state-of-the-art blind bandwidth extension baselines and achieves competitive performance compared to informed methods when tested with synthetic data. Moreover, BABE exhibits robust generalization capabilities when enhancing real historical recordings, effectively reconstructing the missing high-frequency content while maintaining coherence with the original recording. Subjective preference tests confirm that BABE significantly improves the audio quality of historical music recordings. Examples of historical recordings restored with the proposed method are available on the companion webpage: http://research.spa.aalto.fi/publications/papers/ieee-taslp-babe/ - Compression of room impulse responses for compact storage and fast low-latency convolution
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2024-12) Jälmby, Martin; Elvander, Filip; van Waterschoot, ToonRoom impulse responses (RIRs) are used in several applications, such as augmented reality and virtual reality. These applications require a large number of RIRs to be convolved with audio, under strict latency constraints. In this paper, we consider the compression of RIRs, in conjunction with fast time-domain convolution. We consider three different methods of RIR approximation for the purpose of RIR compression and compare them to state-of-the-art compression. The methods are evaluated using several standard objective quality measures, both channel-based and signal-based. We also propose a novel low-rank-based algorithm for fast time-domain convolution and show how the convolution can be carried out without the need to decompress the RIR. Numerical simulations are performed using RIRs of different lengths, recorded in three different rooms. It is shown that compression using low-rank approximation is a very compelling option to the state-of-the-art Opus compression, as it performs as well or better than on all but one considered measure, with the added benefit of being amenable to fast time-domain convolution. - Data Analysis and Pre-Processing for Digital Twin Development, Predictive Modeling of Missing Variables at Viikinmäki Wastewater Treatment Plant
School of Electrical Engineering | Master's thesis(2025-02-22) Kiran, AnmolData Analysis and Pre-Processing for Digital Twin Development Predictive Modelling of Missing Variables at Viikinmäki Wastewater Treatment Plant In the light of the upcoming EU Urban Wastewater Treatment updates requiring energy neutrality for medium and large wastewater treatment plants by 2024, our team at DIGICARBA is designing a digital replica of the Viikinmäki WWTP. While soft sensors are being developed and assessed in a simulation environment, data from online physical sensors is often incomplete or of low quality. This thesis introduces a data pre-processing tool to identify and correct errors in the dataset, as well as a predictive tool to address gaps in critical effluent variables. Together, these tools enhance data quality and availability, supporting improved carbon balance by tracking greenhouse gas emissions and promoting sustainable resource use in wastewater treatment technologies. The motivation for this research was the unreliability of online data, which is often unclean and inconsistent, affecting value forecasting in the simulation environment. This study aimed to substitute online data with lab data when strong correlation was found, allowing the more reliable lab data to be used in the simulation environment. Thus, laboratory data was analysed for correlation and its potential use in the digital twin model. The data was provided by HSY. There were two main types of datasets: Online data: Data was collected from physical sensors mounted at various stages of the wastewater treatment plant. Lab data: collected from the same plant under supervision of industry experts in the laboratory. In the data preprocessing pipeline, time series analysis was conducted for both online and lab data. The data was visualized and examined for missing or NaN values, followed by suitable imputation. Visualization also helped detect outliers, identified using the Interquartile Range (IQR) method and Principal Component Analysis (PCA). Once outliers were removed, missing values were imputed using the PCA method. In the second part, a predictive model for the prediction of NH4N and COD SS was designed. For this purpose, ordinary least squares (OLS) was implemented as a base criterion for other machine learning models. This method did not capture the effect of varying certain parameters of the process. Hence, Autoregressive with exogeneous variables (ARX) model was implemented, which not only improved the rmse but captured the impact of varying essential elements. - Diffusion-Based Generative Equalizer for Music Restoration
A4 Artikkeli konferenssijulkaisussa(2024) Moliner Juanpere, Eloi; Turunen, Maija; Elvander, Filip; Välimäki, VesaThis paper presents a novel approach to audio restoration, focusing on the enhancement of low-quality music recordings, and in particular historical ones. Building upon a previous algorithm called BABE, or Blind Audio Bandwidth Extension, we introduce BABE-2, which presents a series of improvements. This research broadens the concept of bandwidth extension to generative equalization, a task that, to the best of our knowledge, has not been previously addressed for music restoration. BABE-2 is built around an optimization algorithm utilizing priors from diffusion models, which are trained or fine-tuned using a curated set of high-quality music tracks. The algorithm simultaneously performs two critical tasks: estimation of the filter degradation magnitude response and hallucination of the restored audio. The proposed method is objectively evaluated on historical piano recordings, showing an enhancement over the prior version. The method yields similarly impressive results in rejuvenating the works of renowned vocalists Enrico Caruso and Nellie Melba. This research represents an advancement in the practical restoration of historical music. Historical music restoration examples are available at: research.spa.aalto.fi/publications/papers/dafx-babe2/. - Direction-of-arrival and power spectral density estimation using a single directional microphone and group-sparse optimization
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2023-12) Tengan, Elisa; Dietzen, Thomas; Elvander, Filip; van Waterschoot, ToonIn this paper, two approaches are proposed for estimating the direction of arrival (DOA) and power spectral density (PSD) of stationary point sources by using a single, rotating, directional microphone. These approaches are based on a method previously presented by the authors, in which point source DOAs were estimated by using a broadband signal model and solving a group-sparse optimization problem, where the number of observations made by the rotating directional microphone can be lower than the number of candidate DOAs in an angular grid. The DOA estimation is followed by the estimation of the sources’ PSDs through the solution of an overdetermined least squares problem. The first approach proposed in this paper includes the use of an additional nonnegativity constraint on the residual noise term when solving the group-sparse optimization problem and is referred to as the Group Lasso Least Squares (GL-LS) approach. The second proposed approach, in addition to the new nonnegativity constraint, employs a narrowband signal model when building the linear system of equations used for formulating the group-sparse optimization problem, where the DOAs and PSDs can be jointly estimated by iterative, group-wise reweighting. This is referred to as the Group-Lasso with l1-reweighting (GL-L1) approach. Both proposed approaches are implemented using the alternating direction method of multipliers (ADMM), and their performance is evaluated through simulations in which different setup conditions are considered, ranging from different types of model mismatch to variations in the acoustic scene and microphone directivity pattern. The results obtained show that in a scenario involving a microphone response mismatch between observed data and the signal model used, having the additional nonnegativity constraint on the residual noise can improve the DOA estimation for the case of GL-LS and the PSD estimation for the case of GL-L1. Moreover, the GL-L1 approach can present an advantage over GL-LS in terms of DOA estimation performance in scenarios with low SNR or where multiple sources are closely located to each other. Finally, it is shown that having the least squares PSD re-estimation step is beneficial in most scenarios, such that GL-LS outperformed GL-L1 in terms of PSD estimation errors. - Estimating Inharmonic Signals with Optimal Transport Priors
A4 Artikkeli konferenssijulkaisussa(2023-06-10) Elvander, FilipIn this work, we consider the problem of estimating the frequency content of inharmonic signals, i.e., sinusoidal mixtures whose components are close to forming a harmonic set. Intuitively, exploiting this closeness should lead to increased estimation performance as compared to unstructured estimation. Earlier approaches to this problem have relied on parametric descriptions of the inharmonicity, stochastic representations, or have resorted to misspecified estimation by ignoring the inharmonicity. Herein, we propose to use a penalized maximum-likelihood framework, where the regularizer is constructed based on optimal mass transport theory, promoting estimates that are close-to-harmonic in a spectral sense. This leads to an estimator that forms a smooth path between the unstructured maximum-likelihood estimator (MLE) and a misspecified MLE (MMLE), as determined by a regularization parameter. In numerical illustrations, we show that the proposed estimator worst-case dominates the MLE and MMLE, thereby allowing for robust estimation for cases when the inharmonicity level is unknown. - Estimation of Impulse Responses for a Moving Source Using Optimal Transport Regularization
A4 Artikkeli konferenssijulkaisussa(2024-04-19) Sundström, David; Elvander, Filip; Jakobsson, AndreasThe estimation of impulse responses (IRs) is fundamental to various audio applications, including active noise control, telecommunication, and sound zone control. Despite its long history, estimating impulse responses remains challenging when dealing with short signals and with signals having poor spectral excitation. However, in many applications the source is moving such that one has access to several input-output signal pairs corresponding to closely spaced source positions. Intuitively, exploiting this spatial proximity when jointly estimating the full set of IRs should allow for improved estimation performance. In this work, we propose to leverage the information shared between the closely spaced source positions by means of an optimal transport regularizer when estimating IRs from noisy input-output relations. In particular, the proposed transport formulation allows for modeling shifts in time-delays, corresponding to the IR filter taps, caused by the spatial displacement. The method is validated through numerical experiments using a real voice recording as input signal, demonstrating its superior performance in the challenging scenario. - Fast Low-Latency Convolution by Low-Rank Tensor Approximation
A4 Artikkeli konferenssijulkaisussa(2023-06-10) Jälmby, Martin; Elvander, Filip; Waterschoot, Toon vanIn this paper we consider fast time-domain convolution, exploiting low-rank properties of an impulse response (IR). This reduces the computational complexity, speeding up the convolution, without introducing latency. Previous work has considered a truncated singular value decomposition (SVD) of a two-dimensional matricization, or reshaping, of the IR. We here build upon this idea, by providing an algorithm for convolution with a three-dimensional tensorization of the IR. We provide simulations using real-life acoustic room impulse responses (RIRs) of various lengths, convolving them with music, as well as speech signals. The proposed algorithm is shown to outperform the comparable existing algorithm in terms of signal quality degradation, for all considered scenarios, without increasing the computational complexity, or the memory usage. - Financial signal modeling with factor models and optimization of a market-neutral portfolio
Sähkötekniikan korkeakoulu | Bachelor's thesis(2024-05-26) Pennanen, Olli-PekkaThis thesis reviews and evaluates the applicability of factor models for estimating expected stock returns and constructing a market-neutral portfolio. Factor models are widely used to evaluate the risks and returns of assets. Although factor models are thoroughly described in the literature, their application in portfolio optimization is often presented defectively. The aim of this thesis is to investigate the modeling of market signals with factor models. Additionally, these factor models are used in the construction of a marketneutral portfolio. The market-neutral weights for the portfolio are computed with an optimization algorithm presented in the thesis. This thesis focuses only on the U.S market and the stocks included in the Standard and Poor’s 500 index fund. The factor models employed in this thesis are the single factor model and the multifactor model. The multifactor model incorporates the Fama-French Three Factor Model factors. This thesis indicates that the factor models can generally assess the expected returns of individual stocks fairly well. Additionally, the thesis shows that a market-neutral portfolio can be constructed using factor models. However, the assessment of portfolio returns appears to be insufficient with these models. This thesis suggests that the reliability of factor models can be improved with continuous reassessment and adaptation to respond to the changing market conditions. The first section of the thesis is an introduction to the thesis. The second section presents the previous research and theory related to the thesis. The third section describes the methods used for the data collection, creation of the factor models and construction of the market-neutral portfolio. The fourth section presents the results and provides tools for the evaluation of these results. The fifth and final section is a brief summary of the work and its conclusions. - Impulse Response Interpolation Using Optimal Transport
A4 Artikkeli konferenssijulkaisussa(2024-04-01) Sundström, David; Elvander, Filip; Jakobsson, AndreasThe spatial impulse response (IR) interpolation problem is of general interest, e.g. for imaging of subsurface structures based on seismic waves, rendering of audio and radar IRs, as well as for numerous spatial audio applications. A commonly used model represents the occurring reflections as equivalent source positions, often being determined using a sparse re-construction framework employing spatial dictionaries. How-ever, in the presence of calibration errors, such spatial dictionaries tend to inaccurately represent the actual propagation, limiting these methods from being used in practice. In-stead of explicitly assuming an equivalent source model, we here introduce a trade-off between minimizing the distance to an equivalent source model and fitting the data by means of a multi-marginal optimal transport problem. The proposed method is evaluated on real acoustic IRs illustrating its prefer-able performance. - Low-Rank Room Impulse Response Estimation
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2023) Jälmby, Martin; Elvander, Filip; Waterschoot, Toon VanIn this paper we consider low-rank estimation of room impulse responses (RIRs). Inspired by a physics-driven room-acoustical model, we propose an estimator of RIRs that promotes a low-rank structure for a matricization, or reshaping, of the estimated RIR. This low-rank prior acts as a regularizer for the inverse problem of estimating an RIR from input-output observations, preventing overfitting and improving estimation accuracy. As directly enforcing a low rank of the estimate results is an NP-hard problem, we consider two different relaxations, one using the nuclear norm, and one using the recently introduced concept of quadratic envelopes. Both relaxations allow for implementing the proposed estimator using a first-order algorithm with convergence guarantees. When evaluated on both synthetic and recorded RIRs, it is shown that under noisy output conditions, or when the spectral excitation of the input signal is poor, the proposed estimator outperforms comparable existing methods. The performance of the two low-rank relaxations methods is similar, but the quadratic envelope has the benefit of superior robustness to the choice of regularization hyperparameter in the case when the signal-to-noise ratio is unknown. The performance of the proposed method is compared to that of ordinary least squares, Tikhonov least squares, as well as the Cramér-Rao lower bound (CRLB). - Multi-channel Low-rank Convolution of Jointly Compressed Room Impulse Responses
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2024) Jalmby, Martin; Elvander, Filip; van Waterschoot, ToonThe room impulse response (RIR) describes the response of a room to an acoustic excitation signal and models the acoustic channel between a point source and receiver. RIRs are used in a wide range of applications, e.g., virtual reality. In such an application, the availability of closely spaced RIRs and the capability to achieve low latency are imperative to provide an immersive experience. However, representing a complete acoustic environment using a fine grid of RIRs is prohibitive from a storage point of view and without exploiting spatial proximity, acoustic rendering becomes computationally expensive. We therefore propose two methods for the joint compression of multiple RIRs, based on the generalized low-rank approximation of matrices (GLRAM), for the purpose of efficiently storing RIRs and allowing for low-latency convolution. We show how one of the components of the GLRAM decomposition is virtually invariant to the change of position of the source throughout the room and how this can be exploited in the modeling and convolution. In simulations we show how this offers high compression, with less quality degradation than comparable benchmark methods. - Multi-Frequency Tracking via Group-Sparse Optimal Transport
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2024-05-24) Haasler, Isabel; Elvander, FilipIn this work, we introduce an optimal transport framework for inferring power distributions over both spatial location and temporal frequency. Recently, it has been shown that optimal transport is a powerful tool for estimating spatial spectra that change smoothly over time. In this work, we consider the tracking of the spatio-temporal spectrum corresponding to a small number of moving broad-band signal sources. Typically, such tracking problems are addressed by treating the spatio-temporal power distribution in a frequency-by-frequency manner, allowing to use well-understood models for narrow-band signals. This however leads to decreased target resolution due to inefficient use of the available information. We propose an extension of the optimal transport framework that exploits information from several frequencies simultaneously by estimating a spatio-temporal distribution penalized by a group-sparsity regularizer. This approach finds a spatial spectrum that changes smoothly over time, and at each time instance has a small support that is similar across frequencies. To the best of the authors’ knowledge, this is the first formulation combining optimal transport and sparsity for solving inverse problems. As is shown on simulated and real data, our method can successfully track targets in scenarios where information from separate frequency bands alone is insufficient. - Multi-Source Direction-of-Arrival Estimation using Group-Sparse Fitting of Steered Response Power Maps
Conference article in proceedings(2023) Tengan, Elisa; Dietzen, Thomas; Elvander, Filip; Van Waterschoot, ToonIn this paper, a method is proposed for estimating the direction of arrival (DOA) of multiple broadband sound sources by solving a group-sparse optimization problem. A steered response power (SRP) map is modeled using power spectral densities (PSDs) defined on an overcomplete grid of candidate DOAs. The source DOAs are then estimated as the directions corresponding to the largest peaks of the frequency-averaged PSDs. The proposed optimization problem is iteratively solved using the alternating direction method of multipliers (ADMM), and simulation results show that the proposed method overall outperforms the frequency-domain sparse iterative covariance-based estimation (SPICE) method and performs better than or similar to the conventional SRP-PHAT method for varying levels of noise and reverberation. - Multi-Source Direction-of-Arrival Estimation Using Steered Response Power and Group-Sparse Optimization
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2024) Tengan, Elisa; Dietzen, Thomas; Elvander, Filip; Waterschoot, Toon vanIn this paper, a method is proposed for estimating the direction of arrival (DOA) of multiple broadband sound sources. This is achieved through the solution of a group-sparse optimization problem, which models an observed broadband steered response power (SRP) map as a linear function of power spectral densities (PSDs), corresponding to a set of candidate DOAs, and forming a PSD vector. Given the assumption of spatial sparsity, the estimation of the source DOAs is then accomplished by identifying peaks in the resulting spatial power density, i.e., the estimated direction-specific PSDs integrated over frequency. The motivation behind the proposed method lies in its potential to reveal more distinct peaks in the estimated spatial power density than those directly observed in the broadband SRP map, which can be beneficial to the robustness in DOA estimation performance when multiple sources need to be distinguished under varying acoustic conditions. An implementation of the proposed method using the alternating direction method of multipliers (ADMM) is presented, and the DOA estimation performance is evaluated with both simulated and experimental data. Results show that, especially in reverberant scenarios, the proposed method presents an advantage in locating closely spaced sources when compared to the conventional SRP-PHAT, the group-sparse iterative covariance-based estimation (GSPICE) method, and the wideband MUSIC method with geometric averaging. Furthermore, it is observed that for a compact microphone array, the proposed method overall maintained its performance even when using SRP maps computed with grid resolutions that are lower than the sampling requirements of the broadband SRP function. Finally, results obtained with experimental data showed the validity and applicability of the proposed method in a practical meeting room environment. - Optimal sensor placement for the spatial reconstruction of sound fields
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2024-12) Verburg, Samuel A.; Elvander, Filip; van Waterschoot, Toon; Fernandez-Grande, EfrenThe estimation sound fields over space is of interest in sound field control and analysis, spatial audio, room acoustics and virtual reality. Sound fields can be estimated from a number of measurements distributed over space yet this remains a challenging problem due to the large experimental effort required. In this work we investigate sensor distributions that are optimal to estimate sound fields. Such optimization is valuable as it can greatly reduce the number of measurements required. The sensor positions are optimized with respect to the parameters describing a sound field, or the pressure reconstructed at the area of interest, by finding the positions that minimize the Bayesian Cramér-Rao bound (BCRB). The optimized distributions are investigated in a numerical study as well as with measured room impulse responses. We observe a reduction in the number of measurements of approximately 50% when the sensor positions are optimized for reconstructing the sound field when compared with random distributions. The results indicate that optimizing the sensors positions is also valuable when the vector of parameters is sparse, specially compared with random sensor distributions, which are often adopted in sparse array processing in acoustics. - Optimal Transport Based Impulse Response Interpolation in the Presence of Calibration Errors
A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä(2024) Sundstrom, David; Elvander, Filip; Jakobsson, AndreasAcoustic impulse responses (IRs) are widely used to model sound propagation between two points in space. Being a point-to-point description, IRs are generally estimated based on input-output pairs for source and sensor positions of interest. Alternatively, the IR at an arbitrary location in space may be constructed based on interpolation techniques, thus alleviating the need of densely sampling the space. The resulting IR interpolation problem is of general interest, e.g., for imaging of subsurface structures based on seismic waves, rendering of audio and radar IRs, as well as for numerous spatial audio applications. A commonly used model represents the acoustic reflections as image sources, often being determined using a sparse reconstruction framework employing spatial dictionaries. However, in the presence of calibration errors, such spatial dictionaries tend to inaccurately represent the actual propagation, limiting the use of these methods in practical applications. Instead of explicitly assuming an image source model, we here introduce a trade-off between minimizing the distance to an image source model and fitting the data by means of a multi-marginal optimal transport problem. The proposed method is evaluated on the early part of real acoustic IRs from the MeshRIR data set, illustrating its preferable performance as compared to state-of-the-art spatial dictionary-based IR interpolation approaches. - OPTIMIZING THE POSITION OF SENSORS FOR CHARACTERIZING ACOUSTIC FIELDS
A4 Artikkeli konferenssijulkaisussa(2023) Verburg, Samuel A.; Elvander, Filip; van Waterschoot, Toon; Fernandez-Grande, EfrenCharacterizing acoustic fields over space is required in sound field analysis, spatial audio, as well as several applications within room acoustics and virtual reality. In order to measure a sound field over medium/large volumes, a large number of sensors have to be distributed over space. In this study we investigate optimal distributions of sensors for capturing acoustic fields in space. The positions are selected to maximize the sampled information and minimize the uncertainty in the reconstructed field. We show that the proposed optimization substantially reduces the amount of measurements in comparison to uniform or randomized distributions. The proposed optimal selection procedure can also be significant for other data-scarce applications. - Robust Multi-Pitch Estimation via Optimal Transport Clustering
A4 Artikkeli konferenssijulkaisussa(2025-03-07) Björkman, Anton; Elvander, FilipIn this work, we consider the multi-pitch estimation problem, i.e., to estimate multiple sets of harmonically related sinusoids from noisy measurements. We propose to phrase this as a clustering problem with indirect measurements, where we simultaneously infer the spectral content of the signal and group its power into a small set of harmonic structures. The grouping is enforced using a regularization function building on optimal transport theory. The resulting estimator is formulated in terms of the solution of a convex optimization problem, and we present an efficient algorithm implementing the estimator. In numerical experiments, we show that the proposed estimator displays competitive performance as compared to the state-of-the-art. In particular, the proposed estimator is shown to be highly robust to inharmonicities, i.e., deviations from perfect harmonicity. - Robust signal and noise covariance matrix estimation using Riemannian optimization
A4 Artikkeli konferenssijulkaisussa(2024) Brunnström, Jesper; Moonen, Marc; Elvander, FilipCovariance matrix estimation for a noise-contaminated signal is a common signal processing task, where the covariance matrix of the desired signal together with the noise covariance matrix are estimated from two sets of data, i.e., noise-only data and noise-contaminated signal data. The estimation problem can be posed as a constrained optimization problem, for which a closed-form solution exists in terms of the generalized eigenvalue decomposition if the cost function is chosen to be Euclidean distance with prewhitening. However, the problem is difficult to solve for other cost functions, which means it is difficult to incorporate prior knowledge or adapt to new applications. In this paper the optimization problem is recast as an unconstrained optimization problem on a Riemannian manifold, an approach that has not yet been investigated for this particular problem. Using the constructed manifold, a robust covariance matrix estimator based on Tyler’s M-estimator is derived, which uses noise-only data and noise-contaminated signal data. In particular, the robust estimator is useful for impulsive noise, a situation where conventional estimators commonly perform poorly. The effectiveness of the proposed robust estimator under heavy-tailed noise is demonstrated, compared against conventional methods on synthesized data as well as an audio noise-reduction application.