No Language Left Behind
Imagine the informational and cultural isolation that can result if you don’t speak one of the world’s major languages. Think about how limited your Internet experience would be. This is a reality for billions of people worldwide, who find themselves cut off linguistically from this great knowledge resource.
A related problem affects millions of people whose primary fluency is in a major language but whose ancestral traditions arise from a different linguistic heritage. These people find themselves increasingly separated from their ancestral culture, which can only be fully appreciated through an understanding of its native tongue.
Seeking to bring the power of computing to bear on these problems, Microsoft Research is pleased to announce the launch of Microsoft Translator Hub. We’re extremely excited by the potential of this tool to provide meaningful machine translation of lower-resourced languages and to help researchers and others build more targeted language models. The value of the Hub was very apparent to me during two recent events I hosted on opposite sides of the world, the first in California, and the second in Nepal.
California Dreamin’—in Hmong
In late November 2011, Microsoft Research Connections hosted a two-day workshop on Hmong Language Preservation at California State University Fresno, during which the local Hmong community provided input on the White Hmong-English machine translator. (White Hmong, or Hmong Dao, is one of several Hmong dialects.) Hmong is one of the indigenous languages of the mountain people of Southeast Asia, thousands of whom now live in the United States, Australia, and France. As such, many of the Hmong have raised their children and grandchildren without the benefit of immersion in their traditional culture and language. Instead, they have focused on integration into the dominant language and culture of the societies in which they now live.
In general, the second generation grows up somewhat bilingual, speaking Hmong with their parents and other elders, but using English at school and work. When they have children, they speak to them in English. This means the third generation acquires only limited fluency in their ancestral tongue by listening to their grandparents speak with their parents. And given that Hmong has only recently become a written language—within the last 60 years—many of the fluent speakers may not be literate.
These factors have led to a critical and progressive decline in the language’s usage in Hmong communities in the United States, making language preservation a major concern for the Hmong. During the California workshop, Microsoft Research Connections, in collaboration with Professor Phong Yang, a linguist at Cal State Fresno, explored machine translation as a method to preserve the Hmong language and culture.
The participation of the Hmong community was outstanding. Community members of all ages, from children to grandparents, worked with the Machine Translator Hub’s Reviewer UI, offering suggestions and words of encouragement. Hopes were realistic: no one expected the computer to provide a perfect translation between Hmong and English. One amused Hmong parent observed that “it speaks ‘Hmonglish,’ just like my children.” The overall reaction was extremely positive, reflecting the community’s strong desire to preserve their language and culture.
A tangible outcome of the event, hard work by the Microsoft Translator team, and the continued efforts of the Fresno Hmong community is that Microsoft released a public version of Hmong on Bing Translator on February 21 in honor of International Mother Language Day.
Teaching Students to Scale Language Technology Peaks in Nepal
In Nepal, Microsoft Research Connections co-hosted a two-day “Nepali Language Preservation Workshop” in conjunction with Kathmandu University and the nonprofit organization Language Technology Kendra. The goal was to begin the process of strengthening Nepali’s position in today’s digital world, bringing it up to the level of major world languages and increasing access to non-Nepali language Internet content for monolingual Nepali speakers. These efforts expand the presence of Nepali in addition to keeping it vibrant. As a lower-resourced language with a large speaker population (more than 30 million), Nepali is an ideal candidate language for the Microsoft Translator Hub.
David Harrison, a professor of linguistics at Swarthmore College and one of the world’s foremost experts on endangered languages, and I led a session for linguists and translators that focused on reviewing translation quality and providing us with valuable feedback on the reviewer interface. Approximately 1,200 sentences were translated and edited on the first day, and more on the second. Participants reported a number of bugs and suggested improvements.
Meanwhile, in a parallel track, computer science students and educators met under the guidance of Microsoft researchers Christophe Poulain and Sundar Poudel. The purpose of this session was to teach tomorrow’s computer scientists and computer science educators how they can access the nascent Nepali translator model, being refined in the other session, through the Microsoft Translator APIs in a private workspace for automatic translation between Nepali and other languages. By training educators, we give them the tools to go back to their institutions and teach others how to develop web service translation applications, thereby growing young experts in the field of natural language processing.
The enthusiasm and productive work of the workshop participants affirmed that Nepali was an apt choice for the workshop. As one participant observed, “If we can translate Nepali, we can communicate with the outsider world easier.” Another noted that “the rural people don’t understand English, so if we give them a translator, they will feel good and [find it] easy to read information on foreign-language websites.”
I firmly believe that translation systems that can engender community participation, such as Microsoft Translator Hub, can have a beneficial impact on reducing the decline of lower-resourced languages. But it takes a strong commitment by a community to make this a reality. Machine translation mimics how a human learns a new language. Like a person, the translation software needs materials to read comparatively in both languages. It has to be taught and makes mistakes, but it gets better and better as it gets more exposure to the new language (data). Building up that language data to give the system more exposure is one of the chief practical values of events such as these workshops, where the participants actually teach the computer how to speak their native language.
Whether helping to preserve the links to an ancestral culture or working to bring a language into the digital world, Microsoft Translator Hub demonstrates Microsoft’s ongoing engagement and commitment to creating positive social change through technology.
Take a look at the Microsoft Translator Hub website and ask for an invitation to participate.
—Kristin Tolle, Director, Natural Interactions, Microsoft Research Connections