Translator Fast-Tracks Haitian Creole
By Janie Chang, Writer, Microsoft Research
In disaster relief, every hour makes a difference, and communication is essential. When aid efforts began after the recent Haiti earthquake, a request came to the Machine Translation team within Microsoft Research’s Natural Language Processing (NLP) group from Microsoft volunteers involved in the community supporting assistance in Haiti: Was there a quick way to deliver an online English/Haitian Creole translator?
The request to the team came on Tuesday, Jan. 19. Vikram Dendi, senior product manager for the Machine Translation team, sent out an SOS.
Five days later, Dendi sent this e-mail:
“Hi folks – some news that might be of interest to you. After 5 days of work (including about 20 hours nonstop towards the end) our team shipped a scalable Haitian Creole (Kreyòl) system this morning. We would certainly appreciate your help in spreading the word about this– as well as reaching out to humanitarian agencies that might find it of use. The service and the APIs are all available at no cost.”
Figuring Out How to Fast-Track
Microsoft Translator is the translation engine behind other applications, so by adding Haitian Creole to its list of supported languages, user applications such as Bing Translator or TBot Messenger Translation become useful immediately to crisis-response volunteers and companies needing to bridge the language gap in Haiti.
Normally, adding a new language to the machine-translation engine can take weeks, if not months. Driven by the urgency of the situation, Dendi’s product team and NLP researchers put aside other priorities and brainstormed ways to get an experimental but functional Haitian Creole machine-translation system online quickly.
Chris Quirk, a researcher with the Machine Translation team, recalls those initial meetings.
“When Vikram first told us the aid community had asked for a Haitian Creole machine-translation system, I was intrigued but skeptical. Statistical machine translation has the incredible ability to turn parallel translated data into translation systems in a matter of hours or days—once you have enough training data.”
The NLP team knew that its biggest challenge would be identifying parallel data between English and Haitian Creole for training the engine. Haitian Creole, or Kreyòl, is one of two official languages spoken in Haiti; French is the other. Approximately 8 million people in Haiti speak Creole. Compared with more widely spoken languages, the amount of parallel data for Creole is fairly limited.
But team members quickly replaced skepticism with dogged determination and reached out for help. That was when they discovered other groups who had made language resources available.
“For instance,” Quirk says, “Carnegie Mellon University had a repository for parallel Haitian Creole and English spoken and text data. Government agencies released parallel documents and glossaries, and Web sites such as CrisisCommons and haitisurf.com were happy to share glossaries and translation resources.”
Such assistance was invaluable.
“If not for the efforts of the community, who made data and dictionaries available with minimal license restrictions,” Dendi says, “this Haitian Creole machine translator would not be available.”
Quirk and others immediately turned to the task of integrating these language resources, building training systems, and optimizing translation quality. After a few days, the researchers constructed a system that produced reasonable results, and an engineering team worked nights and weekends to make the translation system go live as soon as possible.
Availability, Then Improvement
The team decided on a strategy of making the system available to the aid community as early as possible and then making improvements to the data. Fortunately, the statistical machine-translation system behind Microsoft Translator enables continuous improvement in translation quality through the addition of more training data. While a typical new language release involves significantly larger amounts of training data and quality testing, the team decided there was justification in making the system available to the community as quickly as possible, because the team would be able to keep improving its translation quality.
Delivering Haitian Creole via a proven Web service ensures scale and performance; in combination with Microsoft Translator’s extensive API set, it enables developers who are building solutions for the relief effort to add Haitian Creole support to other software and Web sites.
“Releasing it now means developers can start now,” Dendi says, “and as we add more training data, the translated results will improve. One of the volunteers who contacted us has already built a mobile application using Microsoft Translator APIs.”
An Ongoing Effort
The goals now for Dendi and his team are twofold: improving the training data and making sure the aid community knows the resource is available. Various groups within Microsoft are using social media and blogs to reach out to individual users, as well as to technology projects that could use a scalable translation system in their relief efforts.
“We want everyone who is helping with these relief efforts to know that the services and usage of the Microsoft Translator API are completely free,” Dendi emphasizes. “It can be built into any application or Web site for immediate use. We hope this will help with many of the applications being developed, such as those at crisiscommons.org, to aid in humanitarian efforts. Developers can choose from SOAP, HTTP, and AJAX APIs.”
Since Jan. 25, the team has added more training data, including manually translated data specifically relevant to humanitarian-aid scenarios, which the team hopes will provide much better results in the field. They are “nowhere near done yet” and continue to work on the project.
One of Microsoft’s partners in this effort, the Butler Hill Group, provided human translations and evaluation services at no cost, saying, “We are proud to be able to help you with this important work.”
That’s the sort of collaborative spirit the project engendered.
“It was truly inspiring,” Quirk says, “to see people across the whole natural-language-technologies community work together. This is something we will encourage by releasing more data back to the community. We hope these technologies can help the people at the center of disaster relief efforts communicate a little better.”
How You Can Help
The best way people can help improve the system, Dendi says, is by contributing more training data—typically sentences or words translated between English and Haitian Creole.
If you know about dictionaries, translated sentences, or Web sites that have such translations, please contribute them via the Taus Data Association (TDA) data-sharing initiative. TDA is a non-profit organization providing a neutral, secure platform for sharing language data. Microsoft Research intends to make the Haitian Creole data it collects available to the larger community, via the TDA, for training purposes, as license restrictions permit. Please e-mail your concerns or questions.
There are many initiatives under way for building applications and Web sites to help with the relief efforts, including several for mobile apps, using the SOAP or HTTP API, and Web sites, using the AJAX API. If you have a project in the works, please provide a link to your application or Web site in the comments of the Microsoft Translator Official Team Blog, and the team will make sure to include it where others can find the information.
If you encounter problems with translations or using APIs, e-mail your feedback.