Morfessor 2.0: Python Implementation and Extensions for Morfessor Baseline
Loading...
URL
Journal Title
Journal ISSN
Volume Title
School of Electrical Engineering |
D4 Julkaistu kehittämis- tai tutkimusraportti tai -selvitys
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Date
2013
Major/Subject
Mcode
Degree programme
Language
en
Pages
38
Series
Aalto University publication series SCIENCE + TECHNOLOGY, 25/2013
Abstract
Morfessor is a family of probabilistic machine learning methods that find morphological segmentations for words of a natural language, based solely on raw text data. After the release of the public implementations of the Morfessor Baseline and Categories-MAP methods in 2005, they have become popular as automatic tools for processing morphologically complex languages for applications such as speech recognition and machine translation. This report describes a new implementation of the Morfessor Baseline method. The new version not only fixes the main restrictions of the previous software, but also includes recent methodological extensions such as semi-supervised learning, which can make use of small amounts of manually segmented words. Experimental results for the various features of the implementation are reported for English and Finnish segmentation tasks.Description
Keywords
morpheme segmentation, morphology induction, unsupervised learning, semi-supervised learning, morfessor, machine learning