Translating the Web for the Entire World
By Rob Knies, Managing Editor, Microsoft Research
People all over the world use the Internet every day, to purchase goods or services, to search for information, to find diversions.
But is the World Wide Web truly worldwide?
It’s difficult to make the case. Estimates claim that approximately 70 percent of Web pages today are created in the English language, while the percentage of non-English speakers is growing faster than that of English speakers. So what if you don’t speak English? Or what if you do and you find an interesting page written in German? Or Russian? Or Chinese?
Microsoft Research aims to please.
Windows Live Translator, a free translation portal and a Web service that powers many other translation scenarios, is the result of more than eight years of diligent machine-translation effort within Microsoft Research. With it, Microsoft Research offers a simple, intuitive translation service—while making ongoing improvements to translation quality. In addition to the portal, its Bilingual Viewer features a unique, side-by-side Web-page viewer that translates entire Web pages with blinding speed between 25 sets of language pairs.
For Stephen Richardson and Heather Thorne, who are leading an effort to evangelize Microsoft Research’s machine-translation work for incorporation into a bevy of other Microsoft products and services, Windows Live Translator points the way to a future when the contents of the entire Web will be free of language-based limitations and it will be easy for users to communicate with people everywhere, from within any Microsoft product or Web service.
“This,” says Richardson, principal researcher in the Natural Language Processing (NLP) group within Microsoft Research Redmond, “is a technology that will literally change the way the world works. We’re in a place, here at Microsoft, where that can happen.”
The group’s machine-translation technology was showcased during a couple of events in early March. MIX08, Microsoft’s ongoing conversation with next-generation Web and interactive-agency professionals, scheduled in Las Vegas from March 5 to 7, featured the integration of Windows Live Translator with the upcoming version of Internet Explorer. And during TechFest 2008, an annual gathering set in Redmond on March 5-6 in which Microsoft employees and media representatives from around the world got a chance to observe and discuss the latest projects from Microsoft Research’s worldwide labs, current features and services, as well as future plans, were on display.
“Our vision,” Richardson says, “is to produce a machine-translation system and technology that can provide translation across all of the potential scenarios we can imagine, with Microsoft products and services around the world.”
It’s been a long journey for Richardson, who began working on machine translation while an undergraduate in the 1970s.
“I was a junior in college,” he recalls, “and I was on a project where we trying to create a machine-translation system that we felt would change the world. Everybody’s dream, right?
“Of course, it took a lot longer than I ever dreamt. But to be here now, involved with this great group of people putting out something that just has killer-app potential …”
Thorne, director of business strategy for the Machine Translation product team, comes to the project from an entirely different direction. Having studied Russian and International Studies during her undergraduate days, she found herself working on translation and interpretation while working for NASA on its joint space program with the Russians, and that led her to explore the state of the art of machine translation.
“Granted,” she says, “this was 15 years ago. I remember discovering that quality was quite low. It was not able to replace the need for human translators.”
For certain uses, though, this is slowly changing.
Four years ago, Thorne found her way to Microsoft, working for the Windows organization. Then she heard about Microsoft Research’s machine-translation work.
“When I discovered this team and what they were looking to do, that was a perfect fit for my background and my area of interest,” she says. “I realized that this would be a great opportunity to bring the experiences I’d had in much bigger businesses into this small team, which felt much more like a startup.”
She joined NLP in March 2007 and has played an integral role in guiding the team’s strategy toward integration of machine-translation technology into Microsoft offerings. For example, the team’s scalable Web service is being applied to address specific user scenarios, such as integration into Live Search, Internet Explorer, Windows Live Messenger, Office, and many other products and services. Users can download a widget that they can employ to add Translator to their own Web sites, and individuals can install a Windows Live Translator toolbar button for translations with a mere click. With twice the number of downloads from non-English-speaking markets compared with English-speaking markets, it’s clear that this service meets a need for international audiences.
Still, it’s been a formidable challenge to reach this point. Machine translation is a tough nut to crack. For a long time, machine translation was seen as largely unhelpful; users became frustrated with technology that often turned text in one language to gobbledygook in another.
“Machine translation had this bad reputation,” Richardson recalls, “of being unreadable sometimes.”
Perfection was proving stubbornly elusive. As it turns out, perfection itself was part of the problem.
“There was an acronym from the 1960s: FAHQT—fully automatic high-quality translation of general text,” Richardson says. “That was the holy grail of machine translation. That’s what everybody was trying for.”
FAHQT, though, turned out to be unrealistic. A couple of years ago, Jaap van der Meer, a pioneer in the translation industry, coined a new, more achievable acronym: FAUT—fully automatic useful translation. Instead of trying to devise a system robust enough to fool your school’s infallible French teacher, how about developing one sufficiently accurate to provide translations that could provide real value to real users in real time?
“What we’re trying to do is say, ‘You know, machine translation as a science is not perfect,’ “Richardson says. “It’s far from perfect—just as search is far from perfect today. But there are a lot of things you can do to mitigate the imperfections and help customers get to the results they’re looking for.”
On one hand, there are user-interface improvements, such as the Bilingual Viewer, showing side-by-side Web-page translations that enable a user to compare a translation to the original. On the other hand, there are ways to improve the research process itself to deliver the right degree of accuracy to the right user in the right situation.
Enter MSR-MT, Microsoft Research’s machine-translation project.
MSR-MT is a data-driven machine-translation system behind Windows Live Translator that automatically acquires translation knowledge from previously human-translated text, combining linguistic knowledge and statistical processing into a hybrid approach. Using as input data millions of sentences from Microsoft technical materials that have been translated by humans, MSR-MT is capable of producing output in a single night that is on a qualitative par with systems that require months of human customization.
The system already has proven its value within Microsoft, having been used in 2003 to translate nearly 140,000 customer-support Knowledge Base articles into Spanish. The effort was extended to Japanese the next year and to French and German in 2005. Now, Microsoft’s Knowledge Base materials have been translated into nine languages by MSR-MT.
Such success has lowered the cost barrier to obtaining customized, higher-quality machine translation and is able to provide weekly updates and additions, a goal heretofore impossible to achieve. Bill Gates, Microsoft chairman, gave the mature technology the green light in 2005, and things took off from there.
“What we focused on the past year or two was to take the work we’ve used internally here at Microsoft and make it available outside the company in the most compelling initial scenario we could identify, which turned out to be Search,” Richardson says, “and then build a backbone system, a Web service that could not only supply translations to Search, but also would be the basis for anything else that we did in the future.”
The data-driven approach, Thorne adds, also enables Microsoft Research’s machine-translation efforts to focus on customer needs.
“Given that we probably can’t translate everything well,” she says, “we need to do a good job of understanding which Web sites people are looking at and what they are asking us to translate. What are the areas of the Web that people are really interested in?
“If we have limited resources and limited amounts of data we can get, where do we need to focus our efforts? It’s a combination of the technology getting better and us doing a better job of understanding the customer need.”
Such efforts, of course, require the efforts of many, as Richardson and Thorne are quick to note. Andreas Bode, the team’s development lead, has been instrumental in creating the Web-service infrastructure and leading all development. Chris Wendt, lead program manager, has worked closely with the other product teams to ensure successful integration of the Windows Live Translator Web service into their products. David Darnell has overseen the testing of the technology, and Arul Menezes and Chris Quirk were key contributors to the MSR-MT technology itself.
In addition, collaboration with the Live Search team has proved essential, and the Windows International organization has provided avid support.
“The reason why we have so many languages and gotten all the data we’ve gotten across Microsoft,” Richardson says, “is because of the effort by the internal localization community, which was spearheaded by the Windows International group.”
That team also devised the side-by-side interface that makes Windows Live Translator so easy to use. Initially, the user interface was called the Flipper Flopper. That whimsical contribution has evolved into one of the technology’s most popular features, the Bilingual Viewer.
It’s no surprise that much remains to be accomplished. New subject domains are being investigated, and product integration remains central to ongoing efforts.
“We’re always looking at improving the quality,” Richardson says, “and the more of the right kind of data that you have, and the more you do with it, the better quality you can get.”
For Thorne, it’s been an invigorating experience.
“It’s really, really exciting to be so close to a product where the people I sit next to are literally the guys who wrote the code,” she says. “It’s a very complicated space, and yet it’s still something that you can see and touch in this very tangible way. Everybody takes a lot of pride in what they do, and it’s really exciting to see the progress and to see everybody’s commitment to it.”
Richardson agrees wholeheartedly.
“We’ve always been a tight-knit group at NLP,” he says, “but our machine-translation incubation group has worked their tails off to produce something that has jumped to the forefront of what people have said is cool about machine translation. That makes me incredibly proud and grateful.”