{"id":306017,"date":"2010-10-18T06:00:40","date_gmt":"2010-10-18T13:00:40","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=306017"},"modified":"2016-10-15T15:42:08","modified_gmt":"2016-10-15T22:42:08","slug":"enhancing-multilingual-content-wikipedia","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/enhancing-multilingual-content-wikipedia\/","title":{"rendered":"Enhancing Multilingual Content in Wikipedia"},"content":{"rendered":"<p><em>By Douglas Gantenbein, Senior Writer, Microsoft News Center<\/em><\/p>\n<p>Wikipedia has become one of the world\u2019s largest and perhaps most powerful information repositories. But it is heavily English-centric.<\/p>\n<p>Making Wikipedia more multilingual inspired a <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/lab\/microsoft-research-india\/\" target=\"_blank\">Microsoft Research India<\/a> team to develop a tool called <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"https:\/\/en.wikipedia.org\/wiki\/WikiBhasha\" target=\"_blank\">WikiBhasha<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, which was launched Oct. 18. WikiBhasha\u2014\u201cWiki,\u201d signifying its community-oriented approach; \u201cBhasha,\u201d a Sanskrit word meaning \u201clanguage\u201d\u2014features a content-creation platform that combines linguistic services, such as machine translation, with a Wikipedia-friendly content editor. Everyday users in countries around the world, as well as language enthusiasts, can use WikiBhasha to adapt English-language Wikipedia articles for local languages. Along the way, they can create new local content to expand the article they have translated.<\/p>\n<p>WikiBhasha users also can create new articles from scratch. And in time, the tool could help convert articles in languages other than English into local languages.<\/p>\n<div id=\"attachment_306020\" style=\"width: 410px\" class=\"wp-caption alignleft\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-306020\" class=\"size-full wp-image-306020\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/10\/A-Kumaran.jpg\" alt=\"A Kumaran\" width=\"400\" height=\"283\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/10\/A-Kumaran.jpg 400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/10\/A-Kumaran-300x212.jpg 300w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><p id=\"caption-attachment-306020\" class=\"wp-caption-text\">A Kumaran<\/p><\/div>\n<p>The team behind WikiBhasha is led by A Kumaran, a multilingual-technologies and -systems researcher whose research interests include multilingual and cross-language information processing, machine translation and transliteration, and methods for creating data for computational linguistic research. He and his team started work on WikiBhasha four and a half years ago.<\/p>\n<p>WikiBhasha is designed to solve several problems. Foremost, of course, is broadening the reach and language adoption of Wikipedia.<\/p>\n<p>\u201cWhile English, the most prevalent Wikipedia, has 3.4 million articles, even the second most-popular language, German, has only one-third as many articles,\u201d Kumaran says. \u201cAnd there is a huge tail of more than 200 languages that have fewer than 100,000 articles each. It would be great to help people expand that number\u2014in lots of languages.\u201d<\/p>\n<p>WikiBhasha also offers a way to sharpen the abilities of current machine translators. That, in fact, was the one of the driving forces for the idea behind WikiBhasha. It takes about 4 million sentence pairs, matched between two languages, to develop a machine translator robust enough to create effective translations. In many languages, collecting that much data is a nearly insurmountable task. But if a machine translator can at least start a translation using a smaller data set, then it\u2019s possible for a wiki-style community to build on that and correct the machine translator\u2014literally \u201cteaching\u201d the translator to be more effective.<\/p>\n<p>Kumaran also says that WikiBhasha could be used to take a machine translator that is effective with one type of content\u2014news articles, for instance\u2014and, through community participation, train the translator to handle other content, such as medical articles or other documents that use more specialized language, more effectively.<\/p>\n<div id=\"attachment_306023\" style=\"width: 410px\" class=\"wp-caption alignright\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-306023\" class=\"size-full wp-image-306023\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/10\/WikiBhasha.jpg\" alt=\"creating a new Wikipedia article using WikiBhasha\" width=\"400\" height=\"278\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/10\/WikiBhasha.jpg 400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/10\/WikiBhasha-300x209.jpg 300w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><p id=\"caption-attachment-306023\" class=\"wp-caption-text\">This screenshot shows how a WikiBhasha user would take a Wikipedia article translated from English (left) and edit it before copying it to the right pane before the final step in creating a new Wikipedia article in a target language.<\/p><\/div>\n<p>WikiBhasha is a browser-based tool with an easy-to-use interface that can be invoked atop a Wikipedia page. In a three-step process, a user identifies an appropriate set of English-language articles to use as source of information for contribution to a Wikipedia in their own language. The user is guided through composing and editing a translation and adding additional content as desired and can then submit the completed document to Wikipedia. New articles also can be created using WikiBhasha.<\/p>\n<p>But WikiBhasha is much more than a translator.<\/p>\n<p>\u201cYou can do whatever you want with the content,\u201d Kumaran says. \u201cCreating the translation is only the first step. On the other hand, you can choose to create an article from scratch and not use the translation service at all.\u201d<\/p>\n<div id=\"attachment_306026\" style=\"width: 410px\" class=\"wp-caption alignleft\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-306026\" class=\"size-full wp-image-306026\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/10\/final-edits.jpg\" alt=\"creating a new Wikipedia article using WikiBhasha\" width=\"400\" height=\"278\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/10\/final-edits.jpg 400w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2016\/10\/final-edits-300x209.jpg 300w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><p id=\"caption-attachment-306026\" class=\"wp-caption-text\">Before submitting a translated and edited article to Wikipedia, this screen is used to make final edits. Once the \u201csubmit\u201d button is pressed, the article immediately appears on Wikipedia and is viewable and editable by other users.<\/p><\/div>\n<p>Once an article has been translated and edited, it becomes part of the Wikipedia corpus and can be amended by other users. Any changes a user makes to the translations also are made available to other WikiBhasha users through a cloud-based service, the Collaborative Translation Framework, developed by the Microsoft Machine Translation Incubation Team.<\/p>\n<p>\u201cIt\u2019s very collaborative that way,\u201d Kumaran says. \u201cIf a user takes a sentence and corrects a mistake that the translator has made, it gets recorded in a cloud-based repository. The next time someone uses the service, the machine-translated version\u2014and all the changes people have made to it\u2014also will be available.\u201d<\/p>\n<p>Kumaran says WikiBhasha has been developed to suit both everyday users and experienced Wikipedians. He notes that although Wikipedia has a small percentage of highly active contributors, a large portion of the content is created by a huge number of casual users who make only occasional contributions. WikiBhasha is designed to appeal to both groups. It enables sophisticated editing and content sourcing, but also is easy to use when a user wants to make small changes.<\/p>\n<p>\u201cSome people will want to create a great deal of content, while most others may just want to work on a sentence or two,\u201d Kumaran says. \u201cThe key is to make the user experience simple and intuitive, to attract the casual users repeatedly, and, at the same time, not hamper the productivity of active contributors.\u201d<\/p>\n<p>WikiBhasha has gone through several iterations. It started as a research prototype with a text-based interface. The first externally published version appeared in 2008 as a hosted solution, with editing enabled only on a sentence-by-sentence basis. Though Wikipedia was an early object of the work, the tool could have been used to translate content from news sources such as <em>The New York Times.<\/em><\/p>\n<p>When this early version was shown to the Wikipedia organization in Germany in 2008, the reaction was underwhelming.<\/p>\n<p>\u201cThe response was not good,\u201d Kumaran says candidly. \u201cThe philosophy behind the tool was not right for them. They wanted more free-form content creation in a local language and didn\u2019t just want a mirror image of an English-language article.\u201d<\/p>\n<p>Also the Wikipedians preferred a solution where they stay on the Wikipedia and not work on Wikipedia content in a different domain.<\/p>\n<p>A second version, which stayed on-site on Wikipedia, took care of many of the hosting and technical issues flagged by the WIkipedians. But, in the process, it became too complex for casual users.<\/p>\n<h2>Collaborative Experience<\/h2>\n<p>For the version now being released, Kumaran and his team worked to create an intuitive Wikipedia experience, focus users more on the final content creation and less on the original translated document, and ensure that any user\u2019s work in WikiBhasha becomes part of a collaborative experience. The latter element was key to winning over the Wikimedia Foundation, the parent organization to the global Wikipedias.<\/p>\n<p>Kumaran\u2019s team was able to do so, in part, by committing the project entirely to a Wikipedia-centric approach. The tool was redesigned to integrate tightly with Wikipedia and to be \u201cpart of\u201d the Wikipedia experience during the time a user is translating, adding, and creating content. When WikiBhasha was shown again to Wikipedia representatives, \u201cthe response was very positive,\u201d Kumaran says, \u201ca near 180-degree turn from what we had encountered before.\u201d<\/p>\n<p>WikiBhasha will be made available to the Wikipedia community as a MediaWiki extension. Soon, it will be available as a user gadget on Wikipedia, as well as an installable bookmark at the WikiBhasha web site, which is hosted on the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" href=\"https:\/\/azure.microsoft.com\/en-us\/\" target=\"_blank\">Windows Azure<span class=\"sr-only\"> (opens in new tab)<\/span><\/a> platform.<\/p>\n<p>The current WikiBhasha release supports the 31 languages offered by Microsoft Translator and will be able to handle any new language added to Microsoft Translator. And although the first WikiBhasha release is based on the notion that English-language articles will be the focus, further iterations could enable the use of articles in other languages as source material\u2014German, for instance, or Spanish or Japanese.<\/p>\n<p>Over the next few months, the Wikimedia Foundation and Microsoft Research will be conducting joint workshops and community-interaction sessions in four countries\u2014Brazil, Egypt, India, and Mexico\u2014and the WikiBhasha team will work closely with users in those communities to study and encourage adoption and use of WikiBhasha for enhancing content in the respective Wikipedias. The goal is to enhance the Wikipedia content, as well as to get direct user feedback and refine the WikiBhasha tool while also increasing availability of multilingual content.<\/p>\n<p>Kumaran hopes the release of WikiBhasha will give him several ways to expand his study of language. In one case, he is eager simply to study how people use WikiBhasha.<\/p>\n<h2>\u2018Real-World Research\u2019<\/h2>\n<p>\u201cI am really interested in seeing how to use crowd sourcing as a method for gathering linguistic data,\u201d he says. \u201cWe\u2019d also like to understand which features help or hinder adoption, what are the specific needs of individual demographics, and so on. If the adoption is not up to our expectation, then we would like to know why. This is a fantastic opportunity to do real-world research on what works and what doesn\u2019t work with crowds.\u201d<\/p>\n<p>In addition, WikiBhasha represents a significant open-source contribution from Microsoft Research, as well as its initial engagement with the Wikimedia Foundation and Wikipedia communities.<\/p>\n<p>WikiBhasha is a collaborative project in which multiple individuals from Microsoft Research India have participated. In addition to Kumaran, contributors include K Saravanan from the <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/multilingual-systems\/\" target=\"_blank\">Multilingual Systems Group<\/a>; Naren Datha, Anil Ande, and B. Ashok from the <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/group\/advanced-development-group\/\" target=\"_blank\">Advanced Development Group<\/a>; and Ashwani Sharma, Sridhar Vedantham, and Vidya Natampally from the External Research team.<\/p>\n<p>In addition, a significant contribution to WikiBhasha in terms of design, development, and liaison with the Wikimedia Foundation was made by members of the <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/machine-translation\/\" target=\"_blank\">Machine Translation<\/a> team from <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/lab\/microsoft-research-redmond\/\" target=\"_blank\">Microsoft Research Redmond<\/a>, particularly <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/vikramde\/\" target=\"_blank\">Vikram Dendi<\/a> and Sandor Maurice. WikiBhasha also relies critically on several pieces built and deployed by the machine-translation service and the Collaborative Translation Framework.<\/p>\n<p>\u201cThey are not just service providers for WikiBhasha,\u201d Kumaran says, \u201cbut a part of the WikiBhasha team.\u201d<\/p>\n<p>He says WikiBhasha might open the doors to a whole new world of content translation into languages that machine translators now ignore because it simply takes too much data\u2014and, consequently, too much time\u2014to create a useful translator.<\/p>\n<p>\u201cTake my mother tongue, Tamil, where no translators are available now,\u201d Kumaran says. \u201cMaybe we could use WikiBhasha to bootstrap a machine translator from the ground up. We could start with a rudimentary machine translator based on a small amount of parallel data, deploy WikiBhasha based on it to produce Tamil Wikipedia content and parallel data, which in turn may produce a bit better translator, and so on. This may be the only practical way through which translators in many languages of the world will get created\u2014by community participation.\u201d<\/p>\n<p>For now, though, Kumaran is delighted to see his work come to fruition.<\/p>\n<p>\u201cWe\u2019re very excited about WikiBhasha,\u201d he says. \u201cWikiBhasha is really a very nice idea, and we are hoping it will prove to be useful and successful with the Wikipedians, too.\u201d<\/p>\n","protected":false},"excerpt":{"rendered":"<p>By Douglas Gantenbein, Senior Writer, Microsoft News Center Wikipedia has become one of the world\u2019s largest and perhaps most powerful information repositories. But it is heavily English-centric. Making Wikipedia more multilingual inspired a Microsoft Research India team to develop a tool called WikiBhasha, which was launched Oct. 18. WikiBhasha\u2014\u201cWiki,\u201d signifying its community-oriented approach; \u201cBhasha,\u201d a [&hellip;]<\/p>\n","protected":false},"author":39507,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[],"msr_hide_image_in_river":0,"footnotes":""},"categories":[205399,194456],"tags":[193659,193530,193523,193527,214454],"research-area":[13545],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-306017","post","type-post","status-publish","format-standard","hentry","category-azure","category-natural-language-processing","tag-microsoft-azure","tag-multilingual-content","tag-wikibhasha","tag-wikipedia","tag-wikipedians","msr-research-area-human-language-technologies","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199562,199565],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[144733],"related-projects":[],"related-events":[],"related-researchers":[],"msr_type":"Post","byline":"","formattedDate":"October 18, 2010","formattedExcerpt":"By Douglas Gantenbein, Senior Writer, Microsoft News Center Wikipedia has become one of the world\u2019s largest and perhaps most powerful information repositories. But it is heavily English-centric. Making Wikipedia more multilingual inspired a Microsoft Research India team to develop a tool called WikiBhasha, which was&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/306017","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/39507"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=306017"}],"version-history":[{"count":3,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/306017\/revisions"}],"predecessor-version":[{"id":306038,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/306017\/revisions\/306038"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=306017"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=306017"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=306017"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=306017"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=306017"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=306017"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=306017"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=306017"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=306017"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=306017"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=306017"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}