A Variational Y-Autoencoder for Disentangling Gesture and Material of Interaction Sounds
Loading...
Access rights
openAccess
URL
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
Date
2022-08-15
Major/Subject
Mcode
Degree programme
Language
en
Pages
10
205-214
205-214
Series
AES International Conference on Audio for Virtual and Augmented Reality, AVAR 2022, Proceedings of the AES International Conference, Volume 2022-August
Abstract
Appropriate sound effects are an important aspect of immersive virtual experiences. Particularly in mixed reality scenarios it may be desirable to change the acoustic properties of a naturally occurring interaction sound (e.g., the sound of a metal spoon scraping a wooden bowl) to a sound matching the characteristics of the corresponding interaction in the virtual environment (e.g., using wooden tools in a porcelain bowl). In this paper, we adapt the concept of a Y-Autoencoder (YAE) to the domain of sound effect analysis and synthesis. The YAE model makes it possible to disentangle the gesture and material properties of sound effects with a weakly supervised training strategy where only an identifier label for the material in each training example is given. We show that such a model makes it possible to resynthesize sound effects after exchanging the material label of an encoded example and obtain perceptually meaningful synthesis results with relatively low computational effort. By introducing a variational regularization for the encoded gesture, as well as an adversarial loss, we can further use the model to generate new and varying sound effects with the material characteristics of the training data, while the analyzed audio signal can originate from interactions with unknown materials.Description
Funding Information: During this project, Simon Schwär was supported by a fellowship within the IFI programme of the German Academic Exchange Service (DAAD). The International Audio Laboratories Erlangen are a joint institution of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and Fraun-hofer Institute for Integrated Circuits IIS. Publisher Copyright: © 2022 Audio Engineering Society. All rights reserved.
Keywords
Other note
Citation
Schwär , S , Müller , M & Schlecht , S J 2022 , A Variational Y-Autoencoder for Disentangling Gesture and Material of Interaction Sounds . in AES International Conference on Audio for Virtual and Augmented Reality, AVAR 2022 . Proceedings of the AES International Conference , vol. 2022-August , Audio Engineering Society , pp. 205-214 , AES International Conference on Audio for Virtual and Augmented Reality , Redmond , Washington , United States , 15/08/2022 . < http://www.aes.org/e-lib/browse.cfm?elib=21853 >