Advances in unlimited-vocabulary speech recognition for morphologically rich language

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en
dc.contributor.author Hirsimäki, Teemu
dc.date.accessioned 2012-08-23T05:28:32Z
dc.date.available 2012-08-23T05:28:32Z
dc.date.issued 2009
dc.identifier.isbn 978-951-22-9977-5 (electronic)
dc.identifier.isbn 978-951-22-9976-8 (printed) #8195;
dc.identifier.issn 1797-5069
dc.identifier.uri https://aaltodoc.aalto.fi/handle/123456789/4636
dc.description.abstract Automatic speech recognition systems are devices or computer programs that convert human speech into text or make actions based on what is said to the system. Typical applications include dictation, automatic transcription of large audio or video databases, speech-controlled user interfaces, and automated telephone services, for example. If the recognition system is not limited to a certain topic and vocabulary, covering the words in the target languages as well as possible while maintaining a high recognition accuracy becomes an issue. The conventional way to model the target language, especially in English recognition systems, is to limit the recognition to the most common words of the language. A vocabulary of 60 000 words is usually enough to cover the language adequately for arbitrary topics. On the other hand, in morphologically rich languages, such as Finnish, Estonian and Turkish, long words can be formed by inflecting and compounding, which makes it difficult to cover the language adequately by vocabulary-based approaches. This thesis deals with methods that can be used to build efficient speech recognition systems for morphologically rich languages. Before training the statistical n-gram language models on a large text corpus, the words in the corpus are automatically segmented into smaller fragments, referred to as morphs. The morphs are then used as modelling units of the n-gram models instead of whole words. This makes it possible to train the model on the whole text corpus without limiting the vocabulary and enables the model to create even unseen words by joining morphs together. Since the segmentation algorithm is unsupervised and data-driven, it can be readily used for many languages. Speech recognition experiments are made on various Finnish recognition tasks and some of the experiments are also repeated on an Estonian task. It is shown that the morph-based language models reduce recognition errors when compared to word-based models. It seems to be important, however, that the n-gram models are allowed to use long morph contexts, especially if the morphs used by the model are short. This can be achieved by using growing and pruning algorithms to train variable-length n-gram models. The thesis also presents data structures that can be used for representing the variable-length n-gram models efficiently in recognition systems. By analysing the recognition errors made by Finnish recognition systems it is found out that speaker adaptive training and discriminative training methods help to reduce errors in different situations. The errors are also analysed according to word frequencies and manually defined error classes. en
dc.format.extent Verkkokirja (494 KB, 64 s.)
dc.format.mimetype application/pdf
dc.language.iso en en
dc.publisher Teknillinen korkeakoulu en
dc.relation.ispartofseries TKK dissertations in information and computer science, 14 en
dc.relation.haspart [Publication 1]: Vesa Siivola, Teemu Hirsimäki, Mathias Creutz, and Mikko Kurimo. 2003. Unlimited vocabulary speech recognition based on morphs discovered in an unsupervised manner. In: Proceedings of the 8th European Conference on Speech Communication and Technology (Eurospeech 2003). Geneva, Switzerland. 1-4 September 2003, pages 2293-2296. © 2003 International Speech Communication Association (ISCA). By permission. en
dc.relation.haspart [Publication 2]: Teemu Hirsimäki, Mathias Creutz, Vesa Siivola, Mikko Kurimo, Sami Virpioja, and Janne Pylkkönen. 2006. Unlimited vocabulary speech recognition with morph language models applied to Finnish. Computer Speech and Language, volume 20, number 4, pages 515-541. © 2005 Elsevier Science. By permission. en
dc.relation.haspart [Publication 3]: Vesa Siivola, Teemu Hirsimäki, and Sami Virpioja. 2007. On growing and pruning Kneser–Ney smoothed N-gram models. IEEE Transactions on Audio, Speech, and Language Processing, volume 15, number 5, pages 1617-1624. © 2007 IEEE. By permission. en
dc.relation.haspart [Publication 4]: Teemu Hirsimäki. 2007. On compressing n-gram language models. In: Proceedings of the 32nd IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2007). Honolulu, Hawaii, USA. 15-20 April 2007, pages IV-949-952. © 2007 IEEE. By permission. en
dc.relation.haspart [Publication 5]: Mathias Creutz, Teemu Hirsimäki, Mikko Kurimo, Antti Puurula, Janne Pylkkönen, Vesa Siivola, Matti Varjokallio, Ebru Arısoy, Murat Saraçlar, and Andreas Stolcke. 2007. Morph-based speech recognition and modeling of out-of-vocabulary words across languages. ACM Transactions on Speech and Language Processing, volume 5, number 1, pages 3:1 - 3:29. en
dc.relation.haspart [Publication 6]: Teemu Hirsimäki, Janne Pylkkönen, and Mikko Kurimo. 2009. Importance of high-order n-gram models in morph-based speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, volume 17, number 4, pages 724-732. © 2009 IEEE. By permission. en
dc.relation.haspart [Publication 7]: Teemu Hirsimäki and Mikko Kurimo. 2009. Analysing recognition errors in unlimited-vocabulary speech recognition. In: Proceedings of the 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies Conference (NAACL-HLT 2009). Boulder, Colorado, USA. 31 May - 5 June 2009, pages 193-196. en
dc.subject.other Computer science
dc.subject.other Electrical engineering
dc.title Advances in unlimited-vocabulary speech recognition for morphologically rich language en
dc.type G5 Artikkeliväitöskirja fi
dc.contributor.department Tietojenkäsittelytieteen laitos fi
dc.subject.keyword speech recognition en
dc.subject.keyword language modelling en
dc.subject.keyword n-gram models en
dc.subject.keyword morphology en
dc.subject.keyword error analysis en
dc.identifier.urn URN:ISBN:978-951-22-9977-5
dc.type.dcmitype text en
dc.type.ontasot Väitöskirja (artikkeli) fi
dc.type.ontasot Doctoral dissertation (article-based) en


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search archive


Advanced Search

article-iconSubmit a publication

Browse