Machine Translation Quality Estimation and the Impact of Data Volume on Performance
Loading...
URL
Journal Title
Journal ISSN
Volume Title
Perustieteiden korkeakoulu |
Master's thesis
Unless otherwise stated, all rights belong to the author. You may download, display and print this publication for Your own personal use. Commercial use is prohibited.
Authors
Date
2022-06-13
Department
Major/Subject
Machine Learning, Data Science, Artificial Intelligence
Mcode
SCI3044
Degree programme
Master’s Programme in Computer, Communication and Information Sciences
Language
en
Pages
65
Series
Abstract
Machine Translation Quality Estimation (MTQE) is a growing research topic that aims to predict human post-editing efforts without relying on references. This can save time and costs in the post-editing process in the translation industry. Most of the recent research has focused on building MTQE systems to improve model performance with very limited data volumes. This thesis investigates the impact of data volumes on the MTQE performance of the four language pairs of interest: Finnish-English, English-Finnish, Finnish-Swedish, and English-Swedish. The goals are to: 1) inspect data volume impact, 2) inspect source segment length impact, and 3) investigate whether it is possible to reliably detect near-perfect machine translations. OpenKiwi and TransQuest MTQE frameworks were selected for the experiments. To investigate data volume impacts, MTQE models were trained on different sizes of data volume to predict HTER scores. They were then utilized to evaluate on the corresponding held-out dataset with the Pearson and Mean Absolute Error metrics. After that, prediction results from the best model in each language pair were utilized. Source segment length impacts were investigated by grouping different samples based on the number of word tokens in the source segment and analyzing the Pearson scores in these groups. To identify if it is feasible to detect near-perfect machine translations, different threshold values were set on the prediction results to turn them into classification results. The results obtained from TransQuest demonstrated that MTQE models trained with large data volumes yield better and more stable metrics. The models seemed to better predict Pearson scores at short (1-3 word tokens) source segments than other source segment lengths. In addition, depending on the threshold values of HTER, trained MTQE models could predict near-perfect machine translations with high precision and small to medium recall. OpenKiwi was not robust to the chosen data and required additional data filtering. The framework seemed to be less sensitive to data volume changes and more sensitive to data quality than in TransQuest.Description
Supervisor
Kurimo, MikkoThesis advisor
Andersson, SebastianKeywords
machine translation quality estimation, post-editing, data volume impact, near-perfect machine translation