Multi-Engine Machine Translation Guided by Explicit Word Matching

Date

November 18, 2005

Speaker

Alon Lavie

Affiliation

Carnegie Mellon University

Overview

In this talk, I will describe a recent new approach that we have been developing for synthetically combining the output of several different Machine Translation (MT) engines operating on the same input. The goal of this work is to produce a synthetic combination that surpasses all of the original systems in translation quality. Our approach uses the individual MT engines as “black boxes” and does not require any explicit cooperation from the original MT systems. An explicit word matcher is first used in order to identify and align the words that are common between the MT engine outputs. The matcher can match not only identical words, but also morphological variants and synonyms. A decoding algorithm then uses this information, in conjunction with confidence estimates for the various engines and a trigram language model in order to score and rank a collection of sentence hypotheses that are synthetic combinations of words from the various original engines. The highest scoring sentence hypothesis is selected as the final output of our system.

Experiments conducted on combining several Arabic-to-English and several Chinese-to-English online translation systems demonstrate that our multi-engine combination system provides an improvement of about 6% over the best original system, and is about equal in translation quality to an “oracle” capable of selecting the best of the original systems on a sentence-by-sentence basis. I will describe the details of the approach, and several planned extensions for further improving its effectiveness.

Speakers

Alon Lavie

Alon Lavie is an Associate Research Professor at the Language Technologies Institute, a department in the School of Computer Science at Carnegie Mellon University, where he has been a member of the faculty since 1996. He has a Bachelor’s degree in Computer Science from the Technion, Israel, and received MS and PhD degrees in Computer Science from Carnegie Mellon.His main areas of research are Machine Translation of both text and speech, and Spoken Language Understanding. Dr. Lavie’s current research projects focus on multi-engine Machine Translation, on the design and development of new approaches to Machine Translation for languages with limited amounts of data resources, and on parsing approaches for databases of transcribed spoken language (such as CHILDES). He has also worked extensively on the design and development of Speech-to-Speech Machine Translation systems and on robust parsing algorithms for analysis of spoken language. Dr. Lavie teaches and advises graduate students at the Language Technologies Institute.