Learning Centre

North Sámi morphological segmentation with low-resource semi-supervised sequence labeling

 |  Login

Show simple item record

dc.contributor Aalto-yliopisto fi
dc.contributor Aalto University en
dc.contributor.author Grönroos, Stig-Arne
dc.contributor.author Virpioja, Sami
dc.contributor.author Kurimo, Mikko
dc.date.accessioned 2019-09-25T14:12:58Z
dc.date.available 2019-09-25T14:12:58Z
dc.date.issued 2019-01-07
dc.identifier.citation Grönroos , S-A , Virpioja , S & Kurimo , M 2019 , North Sámi morphological segmentation with low-resource semi-supervised sequence labeling . in Fifth Workshop on Computational Linguistics for Uralic Languages : Proceedings of the Workshop . Association for Computational Linguistics , pp. 15-26 , International Workshop on Computational Linguistics for Uralic Languages , Tartu , Estonia , 07/01/2019 . < https://www.aclweb.org/anthology/W19-0302/ > en
dc.identifier.isbn 978-1-948087-92-6
dc.identifier.other PURE UUID: 832e50de-ac02-4e45-9a9e-af08e5049c1c
dc.identifier.other PURE ITEMURL: https://research.aalto.fi/en/publications/832e50de-ac02-4e45-9a9e-af08e5049c1c
dc.identifier.other PURE LINK: https://www.aclweb.org/anthology/W19-0302/
dc.identifier.other PURE FILEURL: https://research.aalto.fi/files/36793748/2019_iwclul.published.pdf
dc.identifier.uri https://aaltodoc.aalto.fi/handle/123456789/40463
dc.description | openaire: EC/H2020/780069/EU//MeMAD
dc.description.abstract Semi-supervised sequence labeling is an effective way to train a low-resource morphological segmentation system. We show that a feature set augmentation approach, which combines the strengths of generative and discriminative mod- els, is suitable both for graphical models like conditional random field (CRF) and sequence-to-sequence neural models. We perform a comparative evaluation be- tween three existing and one novel semi-supervised segmentation methods. All four systems are language-independent and have open-source implementations. We improve on previous best results for North Sámi morphological segmentation. We see a relative improvement in morph boundary F 1 -score of 8.6% compared to using the generative Morfessor FlatCat model directly and 2.4% compared to a seq2seq baseline. Our neural sequence tagging system reaches almost the same performance as the CRF topline. en
dc.format.extent 12
dc.format.extent 15-26
dc.format.mimetype application/pdf
dc.language.iso en en
dc.relation info:eu-repo/grantAgreement/EC/H2020/780069/EU//MeMAD
dc.relation.ispartof International Workshop on Computational Linguistics for Uralic Languages en
dc.relation.ispartofseries Fifth Workshop on Computational Linguistics for Uralic Languages en
dc.rights openAccess en
dc.title North Sámi morphological segmentation with low-resource semi-supervised sequence labeling en
dc.type A4 Artikkeli konferenssijulkaisussa fi
dc.description.version Peer reviewed en
dc.contributor.department Centre of Excellence in Computational Inference, COIN
dc.contributor.department Dept Signal Process and Acoust
dc.subject.keyword morphology
dc.subject.keyword segmentation
dc.subject.keyword low-resource settings
dc.subject.keyword semi-supervised learning
dc.subject.keyword sequence labeling
dc.subject.keyword recurrent neural networks
dc.subject.keyword conditional random fields
dc.subject.keyword north sami
dc.identifier.urn URN:NBN:fi:aalto-201909255484
dc.type.version publishedVersion


Files in this item

Files Size Format View

There are no open access files associated with this item.

This item appears in the following Collection(s)

Show simple item record

Search archive


Advanced Search

article-iconSubmit a publication

Browse

Statistics