Interpretable Latent Space Using Space-Filling Curves for Phonetic Analysis in Voice Conversion
Loading...
Access rights
openAccess
URL
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
Authors
Date
2023
Major/Subject
Mcode
Degree programme
Language
en
Pages
5
306-310
306-310
Series
Proceedings of Interspeech Conference, Volume 2023-August, Interspeech
Abstract
Vector quantized variational autoencoders (VQ-VAE) are well-known deep generative models, which map input data to a latent space that is used for data generation. Such latent spaces are unstructured and can thus be difficult to interpret. Some earlier approaches have introduced a structure to the latent space through supervised learning by defining data labels as latent variables. In contrast, we propose an unsupervised technique incorporating space-filling curves into vector quantization (VQ), which yields an arranged form of latent vectors such that adjacent elements in the VQ codebook refer to similar content. We applied this technique to the latent codebook vectors of a VQ-VAE, which encode the phonetic information of a speech signal in a voice conversion task. Our experiments show there is a clear arrangement in latent vectors representing speech phones, which clarifies what phone each latent vector corresponds to and facilitates other detailed interpretations of latent vectors.Description
Keywords
Interpretable latent space, phonetic analysis, space-filling curves, vector quantization, voice conversion
Other note
Citation
Vali, M & Bäckström, T 2023, Interpretable Latent Space Using Space-Filling Curves for Phonetic Analysis in Voice Conversion . in Proceedings of Interspeech Conference . vol. 2023-August, Interspeech, International Speech Communication Association (ISCA), pp. 306-310, Interspeech, Dublin, Ireland, 20/08/2023 . https://doi.org/10.21437/Interspeech.2023-1549