Interpretable Latent Space Using Space-Filling Curves for Phonetic Analysis in Voice Conversion

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorVali, Mohammadhassanen_US
dc.contributor.authorBäckström, Tomen_US
dc.contributor.departmentDepartment of Information and Communications Engineeringen
dc.contributor.groupauthorSpeech Interaction Technologyen
dc.date.accessioned2023-10-04T06:09:13Z
dc.date.available2023-10-04T06:09:13Z
dc.date.issued2023en_US
dc.description.abstractVector quantized variational autoencoders (VQ-VAE) are well-known deep generative models, which map input data to a latent space that is used for data generation. Such latent spaces are unstructured and can thus be difficult to interpret. Some earlier approaches have introduced a structure to the latent space through supervised learning by defining data labels as latent variables. In contrast, we propose an unsupervised technique incorporating space-filling curves into vector quantization (VQ), which yields an arranged form of latent vectors such that adjacent elements in the VQ codebook refer to similar content. We applied this technique to the latent codebook vectors of a VQ-VAE, which encode the phonetic information of a speech signal in a voice conversion task. Our experiments show there is a clear arrangement in latent vectors representing speech phones, which clarifies what phone each latent vector corresponds to and facilitates other detailed interpretations of latent vectors.en
dc.description.versionPeer revieweden
dc.format.extent5
dc.format.extent306-310
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationVali, M & Bäckström, T 2023, Interpretable Latent Space Using Space-Filling Curves for Phonetic Analysis in Voice Conversion . in Proceedings of Interspeech Conference . vol. 2023-August, Interspeech, International Speech Communication Association (ISCA), pp. 306-310, Interspeech, Dublin, Ireland, 20/08/2023 . https://doi.org/10.21437/Interspeech.2023-1549en
dc.identifier.doi10.21437/Interspeech.2023-1549en_US
dc.identifier.issn1990-9772
dc.identifier.issn2308-457X
dc.identifier.otherPURE UUID: 4a4f85b7-d6d3-43a1-8d50-12cb1bc9dc48en_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/4a4f85b7-d6d3-43a1-8d50-12cb1bc9dc48en_US
dc.identifier.otherPURE LINK: http://www.scopus.com/inward/record.url?scp=85171567814&partnerID=8YFLogxKen_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/112051987/interspeech.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/123799
dc.identifier.urnURN:NBN:fi:aalto-202310046155
dc.language.isoenen
dc.publisherInternational Speech Communication Association
dc.relation.ispartofInterspeechen
dc.relation.ispartofseriesProceedings of Interspeech Conferenceen
dc.relation.ispartofseriesVolume 2023-Augusten
dc.relation.ispartofseriesInterspeechen
dc.rightsopenAccessen
dc.subject.keywordInterpretable latent spaceen_US
dc.subject.keywordphonetic analysisen_US
dc.subject.keywordspace-filling curvesen_US
dc.subject.keywordvector quantizationen_US
dc.subject.keywordvoice conversionen_US
dc.titleInterpretable Latent Space Using Space-Filling Curves for Phonetic Analysis in Voice Conversionen
dc.typeA4 Artikkeli konferenssijulkaisussafi
dc.type.versionacceptedVersion

Files