Abstract

We investigate two problems in word alignment for machine translation. First, we compare methods for incremental word alignment to save time for large-scale machine translation systems. Various methods of using existing word alignment models trained on a larger, general corpus for incrementally aligning smaller new corpora are compared. In addition, by training separate translation tables, we eliminate the need for any re-processing of the baseline data. Experimental results are comparable or even superior to the baseline batch-mode training. Based on this success, we explore the possibility of sharpening alignment model via incremental training scheme. By first training a general word alignment model on the whole corpus and then dividing the same corpus into domain-specific partitions, followed by applying incremental training to each partition, we can improve machine translation quality as measured by BLEU.

‚Äč