Semi-Supervised Universal Neural Machine Translation


Microsoft Research blog


Machine translation has become a crucial component for enabling global communication. Millions of people are using online translation systems and mobile applications to communicate across language barriers. Machine translation has made rapid advances in recent years with the deep learning wave. Recently, we have announced a historic achievement in Machine Translation with the achievement of human parity in translating news from Chinese to English. Our state-of-the-art system is a Neural Machine Translation that utilizes tens of millions of parallel sentences from the news domain as training data. Such a huge amount of training data is only available for a handful of languages pairs and only in particular domains, such as news and official proceedings.

While there are about 7000 spoken languages in the world, most of the world languages do not have the large resources required to train a useable Machine Translation system. Moreover, even languages with a large amount of parallel data do not have any data in less formal style such as spoken dialects or social media text, which usually is quite different from the formal written style. It is quite difficult to acquire millions of parallel sentences for any language pair. However, hand, we can easily find monolingual data for any language.

In this project, we are tackling the problem of insufficient parallel data with a Semi-Supervised Universal Neural Machine Translation approach that requires only a few thousand parallel sentences for an extremely low resource language to achieve high quality machine translation system. The proposed system utilizes a transfer learning approach to share lexical and sentences level representations across multiple source languages into one target language. Our setup assumes multi-lingual translation from multiple source languages that include both high and low resource languages. Our main objective is to enable sharing the leaned models to benefit the low resource languages. Our system architecture is adding two modifications to the Neural Machine Translation (NMT) Encoder-Decoder framework to enable Semi-Supervised Universal
This system can provide a reliable translation system to any language using only only 6000 parallel sentences. The second scenario is adapting a model that was trained on standard language to be used on a related spoken dialect. For example, a system trained on Arabic-to-English is adapted to translate Arabic spoken dialects using no parallel data at all.