North Sámi morphological segmentation with low-resource semi-supervised sequence labeling

Loading...
Thumbnail Image
Access rights
openAccess
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
Date
2019-01-07
Major/Subject
Mcode
Degree programme
Language
en
Pages
12
15-26
Series
Fifth Workshop on Computational Linguistics for Uralic Languages
Abstract
Semi-supervised sequence labeling is an effective way to train a low-resource morphological segmentation system. We show that a feature set augmentation approach, which combines the strengths of generative and discriminative mod- els, is suitable both for graphical models like conditional random field (CRF) and sequence-to-sequence neural models. We perform a comparative evaluation be- tween three existing and one novel semi-supervised segmentation methods. All four systems are language-independent and have open-source implementations. We improve on previous best results for North Sámi morphological segmentation. We see a relative improvement in morph boundary F 1 -score of 8.6% compared to using the generative Morfessor FlatCat model directly and 2.4% compared to a seq2seq baseline. Our neural sequence tagging system reaches almost the same performance as the CRF topline.
Description
| openaire: EC/H2020/780069/EU//MeMAD
Keywords
morphology, segmentation, low-resource settings, semi-supervised learning, sequence labeling, recurrent neural networks, conditional random fields, north sami
Other note
Citation
Grönroos, S-A, Virpioja, S & Kurimo, M 2019, North Sámi morphological segmentation with low-resource semi-supervised sequence labeling . in Fifth Workshop on Computational Linguistics for Uralic Languages : Proceedings of the Workshop . Association for Computational Linguistics, pp. 15-26, International Workshop on Computational Linguistics for Uralic Languages, Tartu, Estonia, 07/01/2019 . < https://www.aclweb.org/anthology/W19-0302/ >