Morfessor 2.0: Python Implementation and Extensions for Morfessor Baseline

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.authorVirpioja, Sami
dc.contributor.authorSmit, Peter
dc.contributor.authorGrönroos, Stig-Arne
dc.contributor.authorKurimo, Mikko
dc.contributor.departmentSignaalinkäsittelyn ja akustiikan laitosfi
dc.contributor.departmentDepartment of Signal Processing and Acousticsen
dc.contributor.schoolSähkötekniikan korkeakoulufi
dc.contributor.schoolSchool of Electrical Engineeringen
dc.date.accessioned2013-12-12T10:00:59Z
dc.date.available2013-12-12T10:00:59Z
dc.date.issued2013
dc.description.abstractMorfessor is a family of probabilistic machine learning methods that find morphological segmentations for words of a natural language, based solely on raw text data. After the release of the public implementations of the Morfessor Baseline and Categories-MAP methods in 2005, they have become popular as automatic tools for processing morphologically complex languages for applications such as speech recognition and machine translation. This report describes a new implementation of the Morfessor Baseline method. The new version not only fixes the main restrictions of the previous software, but also includes recent methodological extensions such as semi-supervised learning, which can make use of small amounts of manually segmented words. Experimental results for the various features of the implementation are reported for English and Finnish segmentation tasks.en
dc.format.extent38
dc.format.mimetypeapplication/pdf
dc.identifier.isbn978-952-60-5501-5 (electronic)
dc.identifier.issn1799-490X (electronic)
dc.identifier.issn1799-4896 (printed)
dc.identifier.issn1799-4896 (ISSN-L)
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/11836
dc.identifier.urnURN:ISBN:978-952-60-5501-5
dc.language.isoenen
dc.publisherAalto Universityen
dc.publisherAalto-yliopistofi
dc.relation.ispartofseriesAalto University publication series SCIENCE + TECHNOLOGYen
dc.relation.ispartofseries25/2013
dc.subject.keywordmorpheme segmentationen
dc.subject.keywordmorphology inductionen
dc.subject.keywordunsupervised learningen
dc.subject.keywordsemi-supervised learningen
dc.subject.keywordmorfessoren
dc.subject.keywordmachine learningen
dc.subject.otherComputer scienceen
dc.subject.otherLinguistics
dc.titleMorfessor 2.0: Python Implementation and Extensions for Morfessor Baselineen
dc.typeD4 Julkaistu kehittämis- tai tutkimusraportti tai -selvitysfi
dc.type.dcmitypetexten
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
isbn9789526055015.pdf
Size:
244.49 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
licence.txt
Size:
1.22 KB
Format:
Plain Text
Description: