Blind phoneme segmentation with temporal prediction errors

Loading...
Thumbnail Image
Access rights
openAccess
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal
View/Open full text file from the Research portal
Date
2017
Major/Subject
Mcode
Degree programme
Language
en
Pages
7
62-68
Series
Proceedings of the Student Research Workshop at the Annual Meeting of the Association for Computational Linguistics
Abstract
Phonemic segmentation of speech is a critical step of speech recognition systems. We propose a novel unsupervised algorithm based on sequence prediction models such as Markov chains and recurrent neural networks. Our approach consists in analyzing the error profile of a model trained to predict speech features frame-by-frame. Specifically, we try to learn the dynamics of speech in the MFCC space and hypothesize boundaries from local maxima in the prediction error. We evaluate our system on the TIMIT dataset, with improvements over similar methods.
Description
Keywords
Other note
Citation
Michel, P, Räsänen, O, Thiolliere, R & Dupoux, E 2017, Blind phoneme segmentation with temporal prediction errors . in Proceedings of the Student Research Workshop at the Annual Meeting of the Association for Computational Linguistics . Association for Computational Linguistics, pp. 62-68, Annual Meeting of the Association for Computational Linguistics, Vancouver, Canada, 30/07/2017 . https://doi.org/10.18653/v1/P17-3011