English-Chinese Machine Translation for Financial Statements

dc.contributorAalto-yliopistofi
dc.contributorAalto Universityen
dc.contributor.advisorXu, Zhimin
dc.contributor.authorZuo, Si
dc.contributor.schoolSähkötekniikan korkeakoulufi
dc.contributor.supervisorSigg , Stephan
dc.date.accessioned2018-12-14T16:08:54Z
dc.date.available2018-12-14T16:08:54Z
dc.date.issued2018-12-10
dc.description.abstractIn recent years, sequence-to-sequence learning neural networks with attention mechanism have achieved great progress. However, there are still challenges, especially for Neural Machine Translation (NMT), such as lower translation quality on long sentences. In this thesis, we present a hierarchical deep neural network architecture to improve the quality of long sentences translation. The proposed network embeds sequence-to-sequence neural networks into a two-level category hierarchy by following the coarse-to-fine paradigm. Long sentences are input by splitting them into shorter sequences, which can be well processed by the coarse category network as the long distance dependencies for short sentences is able to be handled by a network based on a sequence-to-sequence neural network. Then they are concatenated and corrected by the fine category network. We found that, in some professional documents like financial statements, there are large number of long sentences. So sentences from financial statements are selected as our data. The experiments show that our method can achieve superior results with higher BLEU(Bilingual Evaluation Understudy) scores, lower perplexity and better performance in imitating expression style and words usage than the traditional networks.en
dc.format.extent55+4
dc.identifier.urihttps://aaltodoc.aalto.fi/handle/123456789/35521
dc.identifier.urnURN:NBN:fi:aalto-201812146537
dc.language.isoenen
dc.locationP1fi
dc.programmeCCIS - Master’s Programme in Computer, Communication and Information Sciences (TS2013)fi
dc.programme.majorCommunication Engineeringfi
dc.programme.mcodeELEC3029fi
dc.subject.keywordneural machine translationen
dc.subject.keywordlong sentencesen
dc.subject.keywordprofessional documentsen
dc.subject.keywordsequence-to-sequence learningen
dc.titleEnglish-Chinese Machine Translation for Financial Statementsen
dc.typeG2 Pro gradu, diplomityöfi
dc.type.ontasotMaster's thesisen
dc.type.ontasotDiplomityöfi
local.aalto.electroniconlyyes
local.aalto.openaccessno

Files