Comparison of Non-parametric Bayesian Mixture Models for Syllable Clustering and Zero-Resource Speech Processing

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorSeshadri, Shreyasen_US
dc.contributor.authorRemes, Ulpuen_US
dc.contributor.authorRäsänen, Okkoen_US
dc.contributor.departmentDepartment of Signal Processing and Acousticsen
dc.date.accessioned2017-11-21T13:38:51Z
dc.date.available2017-11-21T13:38:51Z
dc.date.issued2017-08en_US
dc.description.abstractZero-resource speech processing (ZS) systems aim to learn structural representations of speech without access to labeled data. A starting point for these systems is the extraction of syllable tokens utilizing the rhythmic structure of a speech signal. Several recent ZS systems have therefore focused on clustering such syllable tokens into linguistically meaningful units. These systems have so far used heuristically set number of clusters, which can, however, be highly dataset dependent and cannot be optimized in actual unsupervised settings. This paper focuses on improving the flexibility of ZS systems using Bayesian non-parametric (BNP) mixture models that are capable of simultaneously learning the cluster models as well as their number based on the properties of the dataset. We also compare different model design choices, namely priors over the weights and the cluster component models, as the impact of these choices is rarely reported in the previous studies. Experiments are conducted using conversational speech from several languages. The models are first evaluated in a separate syllable clustering task and then as a part of a full ZS system in order to examine the potential of BNP methods and illuminate the relative importance of different model design choices.en
dc.description.versionPeer revieweden
dc.format.extent5
dc.format.extent2744-2748
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationSeshadri, S, Remes, U & Räsänen, O 2017, Comparison of Non-parametric Bayesian Mixture Models for Syllable Clustering and Zero-Resource Speech Processing . in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH . vol. 2017-August, Interspeech: Annual Conference of the International Speech Communication Association, International Speech Communication Association (ISCA), pp. 2744-2748, Interspeech, Stockholm, Sweden, 20/08/2017 . https://doi.org/10.21437/Interspeech.2017-339en
dc.identifier.doi10.21437/Interspeech.2017-339en_US
dc.identifier.issn1990-9772
dc.identifier.otherPURE UUID: c3540567-7d75-4635-bcf9-e0f43eec7782en_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/c3540567-7d75-4635-bcf9-e0f43eec7782en_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/15742114/seshari_interspeech0339.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/28857
dc.identifier.urnURN:NBN:fi:aalto-201711217678
dc.language.isoenen
dc.relation.ispartofInterspeechen
dc.relation.ispartofseriesProceedings of Interspeech 2017en
dc.relation.ispartofseriesInterspeech: Annual Conference of the International Speech Communication Associationen
dc.rightsopenAccessen
dc.rights.copyright© 2017 ISCA. This article was originally published in the Proceedings of Interspeech 2017: Seshadri, S., Remes, U., Räsänen, O. (2017) Comparison of Non-Parametric Bayesian Mixture Models for Syllable Clustering and Zero-Resource Speech Processing. Proc. Interspeech 2017, 2744-2748, DOI: 10.21437/Interspeech.2017-339.en_US
dc.subject.keywordNon-parametric clusteringen_US
dc.subject.keywordzero-resource processingen_US
dc.subject.keywordvariational inferenceen_US
dc.subject.keywordPitman-Yor processen_US
dc.subject.keywordvon Mises-Fisher mixturesen_US
dc.titleComparison of Non-parametric Bayesian Mixture Models for Syllable Clustering and Zero-Resource Speech Processingen
dc.typeA4 Artikkeli konferenssijulkaisussafi
dc.type.versionpublishedVersion

Files