Automatic Speech Recognition for Northern Sámi with comparison to other Uralic Languages
Loading...
Access rights
openAccess
URL
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
View publication in the Research portal (opens in new window)
View/Open full text file from the Research portal (opens in new window)
Other link related to publication (opens in new window)
Date
2016-01-20
Major/Subject
Mcode
Degree programme
Language
en
Pages
11
80-91
80-91
Series
Proceedings of the Second International Workshop on Computational Linguistics for Uralic Languages
Abstract
Speech technology applications for major languages are becoming widely available, but for many other languages there is no commercial interest in developing speech technology. As the lack of technology and applications will threaten the existence of these languages, it is important to study how to create speech recognizers with minimal effort and low resources. As a test case, we have developed a Large Vocabulary Continuous Speech Recognizer for Northern Sámi, an Finno-Ugric language that has little resources for speech technology available. Using only limited audio data, 2.5 hours, and the Northern Sámi Wikipedia for the language model we achieved 7.6% Letter Error Rate (LER). With a language model based on a higher quality language corpus we achieved 4.2% LER. To put this in perspective we also trained systems in other, better-resourced, Finno-Ugric languages (Finnish and Estonian) with the same amount of data and compared those to state-of-the-art systems in those languages.Description
Keywords
Other note
Citation
Smit, P, Leinonen, J, Jokinen, K & Kurimo, M 2016, Automatic Speech Recognition for Northern Sámi with comparison to other Uralic Languages . in Proceedings of the Second International Workshop on Computational Linguistics for Uralic Languages ., 9, University of Szeged, Szeged, Hungary, pp. 80-91, International Workshop on Computational Linguistics for the Uralic Languages, Szeged, Hungary, 20/01/2016 . < http://rgai.inf.u-szeged.hu/project/iwclul/proceedings.pdf >