Learning Non-linear Features for Machine Translation Using Gradient Boosting Machines

Kristina Toutanova; Byung-Gyu Ahn

Learning Non-linear Features for Machine Translation Using Gradient Boosting Machines

Kristina Toutanova ,
Byung-Gyu Ahn

Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics | August 2013

Published by Association for Computational Linguistics

Download BibTex

In this paper we show how to automatically induce non-linear features for machine translation. The new features are selected to approximately maximize a BLEU-related objective and decompose on the level of local phrases, which guarantees that the asymptotic complexity of machine translation decoding does not increase. We achieve this by applying gradient boosting machines (Friedman,2000) to learn new weak learners (features) in the form of regression trees, using a differentiable loss function related to BLEU. Our results indicate that small gains in performance can be achieved using this method but we do not see the dramatic gains observed using feature induction for other important machine learning tasks.