Improved subword modeling for WFST-based speech recognition

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en Smit, Peter Virpioja, Sami Kurimo, Mikko 2017-10-15T20:57:28Z 2017-10-15T20:57:28Z 2017-08
dc.identifier.citation Smit , P , Virpioja , S & Kurimo , M 2017 , Improved subword modeling for WFST-based speech recognition . in Interspeech 2017 . pp. 2551-2555 . DOI: 10.21437/Interspeech.2017-103 en
dc.identifier.other PURE UUID: ed43f22c-f5bd-45ad-99a7-628f82f2283c
dc.identifier.other PURE ITEMURL:
dc.identifier.other PURE FILEURL:
dc.description.abstract Because in agglutinative languages the number of observed word forms is very high, subword units are often utilized in speech recognition. However, the proper use of subword units requires careful consideration of details such as silence modeling, position-dependent phones, and combination of the units. In this paper, we implement subword modeling in the Kaldi toolkit by creating modified lexicon by finite-state transducers to represent the subword units correctly. We experiment with multiple types of word boundary markers and achieve the best results by adding a marker to the left or right side of a subword unit whenever it is not preceded or followed by a word boundary, respectively. We also compare three different toolkits that provide data-driven subword segmentations. In our experiments on a variety of Finnish and Estonian datasets, the best subword models do outperform word-based models and naive subword implementations. The largest relative reduction in WER is a 23% over word-based models for a Finnish read speech dataset. The results are also better than any previously published ones for the same datasets, and the improvement on all datasets is more than 5%. en
dc.format.extent 2551-2555
dc.format.mimetype application/pdf
dc.language.iso en en
dc.relation.ispartofseries Interspeech 2017 en
dc.rights openAccess en
dc.subject.other 213 Electronic, automation and communications engineering, electronics en
dc.title Improved subword modeling for WFST-based speech recognition en
dc.type A4 Artikkeli konferenssijulkaisussa fi
dc.description.version Peer reviewed en
dc.contributor.department Department of Signal Processing and Acoustics
dc.subject.keyword speech recognition
dc.subject.keyword Kaldi
dc.subject.keyword subword modeling
dc.subject.keyword Finnish
dc.subject.keyword Estonian
dc.subject.keyword 213 Electronic, automation and communications engineering, electronics
dc.identifier.urn URN:NBN:fi:aalto-201710157202
dc.identifier.doi 10.21437/Interspeech.2017-103
dc.type.version acceptedVersion

Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search archive

Advanced Search

article-iconSubmit a publication


My Account