dc.contributor |
Aalto-yliopisto |
fi |
dc.contributor |
Aalto University |
en |
dc.contributor.author |
Grönroos, Stig-Arne |
|
dc.contributor.author |
Virpioja, Sami |
|
dc.contributor.author |
Kurimo, Mikko |
|
dc.date.accessioned |
2019-09-25T14:12:58Z |
|
dc.date.available |
2019-09-25T14:12:58Z |
|
dc.date.issued |
2019-01-07 |
|
dc.identifier.citation |
Grönroos , S-A , Virpioja , S & Kurimo , M 2019 , North Sámi morphological segmentation with low-resource semi-supervised sequence labeling . in Fifth Workshop on Computational Linguistics for Uralic Languages : Proceedings of the Workshop . Association for Computational Linguistics , pp. 15-26 , International Workshop on Computational Linguistics for Uralic Languages , Tartu , Estonia , 07/01/2019 . < https://www.aclweb.org/anthology/W19-0302/ > |
en |
dc.identifier.isbn |
978-1-948087-92-6 |
|
dc.identifier.other |
PURE UUID: 832e50de-ac02-4e45-9a9e-af08e5049c1c |
|
dc.identifier.other |
PURE ITEMURL: https://research.aalto.fi/en/publications/832e50de-ac02-4e45-9a9e-af08e5049c1c |
|
dc.identifier.other |
PURE LINK: https://www.aclweb.org/anthology/W19-0302/ |
|
dc.identifier.other |
PURE FILEURL: https://research.aalto.fi/files/36793748/2019_iwclul.published.pdf |
|
dc.identifier.uri |
https://aaltodoc.aalto.fi/handle/123456789/40463 |
|
dc.description |
| openaire: EC/H2020/780069/EU//MeMAD |
|
dc.description.abstract |
Semi-supervised sequence labeling is an effective way to train a low-resource morphological segmentation system. We show that a feature set augmentation approach, which combines the strengths of generative and discriminative mod- els, is suitable both for graphical models like conditional random field (CRF) and sequence-to-sequence neural models. We perform a comparative evaluation be- tween three existing and one novel semi-supervised segmentation methods. All four systems are language-independent and have open-source implementations. We improve on previous best results for North Sámi morphological segmentation. We see a relative improvement in morph boundary F 1 -score of 8.6% compared to using the generative Morfessor FlatCat model directly and 2.4% compared to a seq2seq baseline. Our neural sequence tagging system reaches almost the same performance as the CRF topline. |
en |
dc.format.extent |
12 |
|
dc.format.extent |
15-26 |
|
dc.format.mimetype |
application/pdf |
|
dc.language.iso |
en |
en |
dc.relation |
info:eu-repo/grantAgreement/EC/H2020/780069/EU//MeMAD |
|
dc.relation.ispartof |
International Workshop on Computational Linguistics for Uralic Languages |
en |
dc.relation.ispartofseries |
Fifth Workshop on Computational Linguistics for Uralic Languages |
en |
dc.rights |
openAccess |
en |
dc.title |
North Sámi morphological segmentation with low-resource semi-supervised sequence labeling |
en |
dc.type |
A4 Artikkeli konferenssijulkaisussa |
fi |
dc.description.version |
Peer reviewed |
en |
dc.contributor.department |
Centre of Excellence in Computational Inference, COIN |
|
dc.contributor.department |
Dept Signal Process and Acoust |
|
dc.subject.keyword |
morphology |
|
dc.subject.keyword |
segmentation |
|
dc.subject.keyword |
low-resource settings |
|
dc.subject.keyword |
semi-supervised learning |
|
dc.subject.keyword |
sequence labeling |
|
dc.subject.keyword |
recurrent neural networks |
|
dc.subject.keyword |
conditional random fields |
|
dc.subject.keyword |
north sami |
|
dc.identifier.urn |
URN:NBN:fi:aalto-201909255484 |
|
dc.type.version |
publishedVersion |
|