Automatic Speech Recognition for Northern Sámi with comparison to other Uralic Languages

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en
dc.contributor.author Smit, Peter
dc.contributor.author Leinonen, Juho
dc.contributor.author Jokinen, Kristiina
dc.contributor.author Kurimo, Mikko
dc.date.accessioned 2017-01-19T10:40:46Z
dc.date.issued 2016-01-20
dc.identifier.citation Smit , P , Leinonen , J , Jokinen , K & Kurimo , M 2016 , Automatic Speech Recognition for Northern Sámi with comparison to other Uralic Languages . in T A Pirinen , E Simon , F M Tyers & V Vincze (eds) , Proceedings of the Second International Workshop on Computational Linguistics for Uralic Languages . , 9 , University of Szeged , Szeged, Hungary , pp. 80-91 , International Workshop on Computational Linguistics for the Uralic Languages , Szeged , Hungary , 20-21 January . en
dc.identifier.isbn 978-963-306-504-4
dc.identifier.other PURE UUID: 15232107-a79e-4fbd-ab26-e7824c2484af
dc.identifier.other PURE ITEMURL: https://research.aalto.fi/en/publications/automatic-speech-recognition-for-northern-smi-with-comparison-to-other-uralic-languages(15232107-a79e-4fbd-ab26-e7824c2484af).html
dc.identifier.other PURE LINK: http://rgai.inf.u-szeged.hu/project/iwclul/proceedings.pdf
dc.identifier.other PURE FILEURL: https://research.aalto.fi/files/10368745/iwclul2016smit.pdf
dc.identifier.uri https://aaltodoc.aalto.fi/handle/123456789/24164
dc.description.abstract Speech technology applications for major languages are becoming widely available, but for many other languages there is no commercial interest in developing speech technology. As the lack of technology and applications will threaten the existence of these languages, it is important to study how to create speech recognizers with minimal effort and low resources. As a test case, we have developed a Large Vocabulary Continuous Speech Recognizer for Northern Sámi, an Finno-Ugric language that has little resources for speech technology available. Using only limited audio data, 2.5 hours, and the Northern Sámi Wikipedia for the language model we achieved 7.6% Letter Error Rate (LER). With a language model based on a higher quality language corpus we achieved 4.2% LER. To put this in perspective we also trained systems in other, better-resourced, Finno-Ugric languages (Finnish and Estonian) with the same amount of data and compared those to state-of-the-art systems in those languages. en
dc.format.extent 11
dc.format.extent 80-91
dc.format.mimetype application/pdf
dc.language.iso en en
dc.relation.ispartof International Workshop on Computational Linguistics for the Uralic Languages en
dc.relation.ispartofseries Proceedings of the Second International Workshop on Computational Linguistics for Uralic Languages en
dc.rights openAccess en
dc.subject.other 113 Computer and information sciences en
dc.title Automatic Speech Recognition for Northern Sámi with comparison to other Uralic Languages en
dc.type A4 Artikkeli konferenssijulkaisussa fi
dc.description.version Peer reviewed en
dc.contributor.department Department of Signal Processing and Acoustics
dc.contributor.department University of Helsinki
dc.subject.keyword 113 Computer and information sciences
dc.identifier.urn URN:NBN:fi:aalto-201701191109
dc.type.version acceptedVersion


Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search archive


Advanced Search

article-iconSubmit a publication

Browse

My Account