North Sámi morphological segmentation with low-resource semi-supervised sequence labeling
Loading...
Access rights
openAccess
Journal Title
Journal ISSN
Volume Title
A4 Artikkeli konferenssijulkaisussa
This publication is imported from Aalto University research portal.
View publication in the Research portal
View/Open full text file from the Research portal
Other link related to publication
View publication in the Research portal
View/Open full text file from the Research portal
Other link related to publication
Date
2019-01-07
Department
Major/Subject
Mcode
Degree programme
Language
en
Pages
12
15-26
15-26
Series
Fifth Workshop on Computational Linguistics for Uralic Languages
Abstract
Semi-supervised sequence labeling is an effective way to train a low-resource morphological segmentation system. We show that a feature set augmentation approach, which combines the strengths of generative and discriminative mod- els, is suitable both for graphical models like conditional random field (CRF) and sequence-to-sequence neural models. We perform a comparative evaluation be- tween three existing and one novel semi-supervised segmentation methods. All four systems are language-independent and have open-source implementations. We improve on previous best results for North Sámi morphological segmentation. We see a relative improvement in morph boundary F 1 -score of 8.6% compared to using the generative Morfessor FlatCat model directly and 2.4% compared to a seq2seq baseline. Our neural sequence tagging system reaches almost the same performance as the CRF topline.Description
| openaire: EC/H2020/780069/EU//MeMAD
Keywords
morphology, segmentation, low-resource settings, semi-supervised learning, sequence labeling, recurrent neural networks, conditional random fields, north sami
Other note
Citation
Grönroos, S-A, Virpioja, S & Kurimo, M 2019, North Sámi morphological segmentation with low-resource semi-supervised sequence labeling . in Fifth Workshop on Computational Linguistics for Uralic Languages : Proceedings of the Workshop . Association for Computational Linguistics, pp. 15-26, International Workshop on Computational Linguistics for Uralic Languages, Tartu, Estonia, 07/01/2019 . < https://www.aclweb.org/anthology/W19-0302/ >