Machine Translation Quality Estimation and the Impact of Data Volume on Performance

Loading...
Thumbnail Image

URL

Journal Title

Journal ISSN

Volume Title

Perustieteiden korkeakoulu | Master's thesis

Date

2022-06-13

Department

Major/Subject

Machine Learning, Data Science, Artificial Intelligence

Mcode

SCI3044

Degree programme

Master’s Programme in Computer, Communication and Information Sciences

Language

en

Pages

65

Series

Abstract

Machine Translation Quality Estimation (MTQE) is a growing research topic that aims to predict human post-editing efforts without relying on references. This can save time and costs in the post-editing process in the translation industry. Most of the recent research has focused on building MTQE systems to improve model performance with very limited data volumes. This thesis investigates the impact of data volumes on the MTQE performance of the four language pairs of interest: Finnish-English, English-Finnish, Finnish-Swedish, and English-Swedish. The goals are to: 1) inspect data volume impact, 2) inspect source segment length impact, and 3) investigate whether it is possible to reliably detect near-perfect machine translations. OpenKiwi and TransQuest MTQE frameworks were selected for the experiments. To investigate data volume impacts, MTQE models were trained on different sizes of data volume to predict HTER scores. They were then utilized to evaluate on the corresponding held-out dataset with the Pearson and Mean Absolute Error metrics. After that, prediction results from the best model in each language pair were utilized. Source segment length impacts were investigated by grouping different samples based on the number of word tokens in the source segment and analyzing the Pearson scores in these groups. To identify if it is feasible to detect near-perfect machine translations, different threshold values were set on the prediction results to turn them into classification results. The results obtained from TransQuest demonstrated that MTQE models trained with large data volumes yield better and more stable metrics. The models seemed to better predict Pearson scores at short (1-3 word tokens) source segments than other source segment lengths. In addition, depending on the threshold values of HTER, trained MTQE models could predict near-perfect machine translations with high precision and small to medium recall. OpenKiwi was not robust to the chosen data and required additional data filtering. The framework seemed to be less sensitive to data volume changes and more sensitive to data quality than in TransQuest.

Description

Supervisor

Kurimo, Mikko

Thesis advisor

Andersson, Sebastian

Keywords

machine translation quality estimation, post-editing, data volume impact, near-perfect machine translation

Other note

Citation