In this talk, I will describe a recent new approach that we have been developing for synthetically combining the output of several different Machine Translation (MT) engines operating on the same input. The goal of this work is to produce a synthetic combination that surpasses all of the original systems in translation quality. Our approach uses the individual MT engines as “black boxes” and does not require any explicit cooperation from the original MT systems. An explicit word matcher is first used in order to identify and align the words that are common between the MT engine outputs. The matcher can match not only identical words, but also morphological variants and synonyms. A decoding algorithm then uses this information, in conjunction with confidence estimates for the various engines and a trigram language model in order to score and rank a collection of sentence hypotheses that are synthetic combinations of words from the various original engines. The highest scoring sentence hypothesis is selected as the final output of our system.
Experiments conducted on combining several Arabic-to-English and several Chinese-to-English online translation systems demonstrate that our multi-engine combination system provides an improvement of about 6% over the best original system, and is about equal in translation quality to an “oracle” capable of selecting the best of the original systems on a sentence-by-sentence basis. I will describe the details of the approach, and several planned extensions for further improving its effectiveness.