Microsoft Research’s Machine Translation (MSR-MT) group has been among the leading research organizations in the machine translation space for over 8 years, and some of the foundational work in natural language processing at MSR began over 16 years ago. The team’s approach to machine translation integrates linguistic features with state-of-the-art statistical machine translation algorithms. The team’s focus has always been on automatically acquiring translation knowledge from bilingual corpora, i.e., parallel data consisting of original source language sentences and their corresponding translations by human translators. About 3 years ago, the team’s focus shifted from a purely rule-based approach to this task toward a hybrid approach that includes extensive statistical processing, allowing for greater scalability across domains and into new languages.
Microsoft’s Machine Translation technology was first developed for in-house localization purposes, to allow our Customer Support organization to publish technical support documents with a frequency and language breadth that would have been prohibitively expensive using human translators. With all of Microsoft’s previously human-translated documents and localized software at its disposal, the MT team was able automatically to train its statistical MT engine to achieve quite good quality in the technical domain. This technology was extended to support the Windows localization team, the Developer Division, MSDN, and several other groups within Microsoft. It has also allowed Microsoft to reach many more customers than would have ever been possible using human translation alone.
After focusing on Microsoft’s own translation needs, the team began to build a scalable web service that would allow it to provide translation services to the general public, as a standalone tool on the web, and as a feature within other products. Given that the Microsoft MT engine has been trained most heavily on technical data, it has not yet been tuned for translating text in other subject domains. However, we hope to continue improving the quality and breadth of the engine. We look forward to sharing our developments with you over the coming months on this blog.