Using stacked transformations for recognizing foreign accented speech
Loading...
Post print
URL
Journal Title
Journal ISSN
Volume Title
School of Science |
A4 Artikkeli konferenssijulkaisussa
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
Major/Subject
Mcode
Degree programme
Language
en
Pages
4
Series
Abstract
A common problem in speech recognition for foreign accented speech is that there is not enough training data for an accent-specific or a speaker-specific recognizer. Speaker adaptation can be used to improve the accuracy of a speaker independent recognizer, but a lot of adaptation data is needed for speakers with a strong foreign accent. In this paper we propose a rather simple and successful technique of stacked transformations where the baseline models trained for native speakers are first adapted by using accent-specific data and then by another transformation using speaker-specific data. Because the accent-specific data can be collected offline, the first transformation can be more detailed and comprehensive, and the second one less detailed and fast. Experimental results are provided for speaker adaptation in English spoken by Finnish speakers. The evaluation results confirm that the stacked transformations are very helpful for fast speaker adaptation.Description
Other note
Citation
Smit, Peter & Kurimo, Mikko. 2011. Using stacked transformations for recognizing foreign accented speech. ICASSP Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on. ISSN 1520-6149 (printed). ISBN 978-1-4577-0537-3 (electronic). ISBN 978-1-4577-0538-0 (printed). DOI: 10.1109/ICASSP.2011.5947481.