Studies on Training Text Selection for Conversational Finnish Language Modeling

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorEnarvi, Seppoen_US
dc.contributor.authorKurimo, Mikkoen_US
dc.contributor.departmentTietojenkäsittelytieteen laitoen
dc.contributor.departmentDepartment of Signal Processing and Acousticsen
dc.contributor.groupauthorSpeech Recognitionen
dc.date.accessioned2017-08-03T12:08:40Z
dc.date.available2017-08-03T12:08:40Z
dc.date.issued2013en_US
dc.descriptionVK: coin
dc.description.abstractCurrent ASR and MT systems do not operate on conversational Finnish, because training data for colloquial Finnish has not been available. Although speech recognition performance on literary Finnish is already quite good, those systems have very poor baseline performance in conversational speech. Text data for relevant vocabulary and language models can be collected from the Internet, but web data is very noisy and most of it is not helpful for learning good models. Finnish language is highly agglutinative, and written phonetically. Even phonetic reductions and sandhi are often written down in informal discussions. This increases vocabulary size dramatically and causes word-based selection methods to fail. Our selection method explicitly optimizes the perplexity of a subword language model on the development data, and requires only very limited amount of speech transcripts as development data. The language models have been evaluated for speech recognition using a new data set consisting of generic colloquial Finnish.en
dc.description.versionPeer revieweden
dc.format.extent8
dc.format.mimetypeapplication/pdfen_US
dc.identifier.citationEnarvi, S & Kurimo, M 2013, Studies on Training Text Selection for Conversational Finnish Language Modeling. in 10th International Workshop on Spoken Language Translation, (IWSLT 2013), Heidelberg, 5 Dec 2013 - 6 Dec 2013. pp. 256-263. < http://workshop2013.iwslt.org/downloads/Studies_on_Training_Text_Selection_for_Conversational_Finnish_Language_Modeling.pdf >en
dc.identifier.otherPURE UUID: 50ee6ee5-0608-48b3-a658-219c719b3bb7en_US
dc.identifier.otherPURE ITEMURL: https://research.aalto.fi/en/publications/50ee6ee5-0608-48b3-a658-219c719b3bb7en_US
dc.identifier.otherPURE LINK: http://workshop2013.iwslt.org/downloads/Studies_on_Training_Text_Selection_for_Conversational_Finnish_Language_Modeling.pdfen_US
dc.identifier.otherPURE FILEURL: https://research.aalto.fi/files/14166819/Studies_on_Training_Text_Selection_for_Conversational_Finnish_Language_Modeling.pdfen_US
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/27374
dc.identifier.urnURN:NBN:fi:aalto-201708036342
dc.language.isoenen
dc.relation.ispartofseries10th International Workshop on Spoken Language Translation, (IWSLT 2013), Heidelberg, 5 Dec 2013 - 6 Dec 2013en
dc.relation.ispartofseriespp. 256-263en
dc.rightsopenAccessen
dc.titleStudies on Training Text Selection for Conversational Finnish Language Modelingen
dc.typeA4 Artikkeli konferenssijulkaisussafi
dc.type.versionpublishedVersion

Files