{"id":722737,"date":"2021-02-08T09:16:09","date_gmt":"2021-02-08T17:16:09","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=722737"},"modified":"2022-04-28T07:22:47","modified_gmt":"2022-04-28T14:22:47","slug":"speller100-zero-shot-spelling-correction-at-scale-for-100-plus-languages","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/speller100-zero-shot-spelling-correction-at-scale-for-100-plus-languages\/","title":{"rendered":"Speller100: Zero-shot spelling correction at scale for 100-plus languages"},"content":{"rendered":"\n<figure class=\"wp-block-image alignwide size-large\"><img decoding=\"async\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/1400x788_Speller_100_no_logo.gif\" alt=\"Speller 100: Our spelling correction technology powers several product experiences across Microsoft\"\/><\/figure>\n\n\n\n<p>At Microsoft Bing, our mission is to delight users everywhere with the best search experience. We serve a diverse set of customers all over the planet who issue queries in over 100 languages. In search we\u2019ve found about 15% of queries submitted by customers have misspellings. When queries are misspelled, we match the wrong set of documents and trigger incorrect answers, which can produce a suboptimal results page for our customers. Therefore, spelling correction is the very first component in the Bing search stack because searching for the correct spelling of what users mean improves all downstream search components. Our spelling correction technology powers several product experiences across Microsoft. Since it is important to us to provide all customers with access to accurate, state-of-the-art spelling correction, we are improving search so that it is inclusive of more languages from around the world with the help of <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/project\/ai-at-scale\/\">AI at Scale<\/a>.<\/p>\n\n\n\n<p>We have had high-quality spelling correction for about two dozen languages for quite some time. However, that left users who issued queries in many more languages dealing with inferior results or manually correcting queries themselves. In order to make Bing more inclusive, we set out to expand our current spelling correction service to 100-plus languages, setting the same high bar for quality that we set for the original two dozen languages. We\u2019ve found we need a very large number of data points to train a high-quality spelling correction model for each language, and sourcing data in over 100 languages would be incredibly difficult logistically\u2014not to mention costly in both time and money.<\/p>\n\n\n\n<h2 id=\"a-speller-for-100-plus-languages-in-microsoft\">A speller for 100-plus languages in Microsoft  <\/h2>\n\n\n\n<p>Despite these challenges, we have recently launched our large-scale multilingual spelling correction models worldwide with high precision and high recall in 100-plus languages! These models, technology we collectively call Speller100, are currently helping to improve search results for these languages in Bing. This is a huge step forward, especially when considering that spelling correction was available for just a few dozen languages a short time ago. This was made possible by leveraging recent advances in AI, particularly zero-shot learning combined with carefully designed large-scale pretraining tasks, and we also draw on historical linguistics theories.<\/p>\n\n\n\n<figure class=\"wp-block-gallery alignwide has-nested-images columns-2 is-cropped wp-block-gallery-1 is-layout-flex wp-block-gallery-is-layout-flex\">\n<figure class=\"wp-block-image size-full\"><a data-bi-bhvr=\"14\"  data-bi-cn=\"graphical user interface, text, application, email\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Updated-Figure.png\"><img loading=\"lazy\" decoding=\"async\" width=\"688\" height=\"719\" data-id=\"723982\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Updated-Figure.png\" alt=\"graphical user interface, text, application, email\" class=\"wp-image-723982\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Updated-Figure.png 688w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Updated-Figure-287x300.png 287w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Updated-Figure-11x12.png 11w\" sizes=\"auto, (max-width: 688px) 100vw, 688px\" \/><\/a><figcaption>Query: {\u0437\u043d\u0430\u0447\u0435\u045a\u0435 \u043e\u0431\u0435\u0434\u0438\u043d\u0435\u0442\u0438 \u043d\u0430\u0446\u0438\u0438\u0450} in Macedonian, translation {meaning united nations} <br>Should be spelled as {\u0437\u043d\u0430\u0447\u0435\u045a\u0435 \u043e\u0431\u0435\u0434\u0438\u043d\u0435\u0442\u0438 \u043d\u0430\u0446\u0438\u0438}<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><a data-bi-bhvr=\"14\"  data-bi-cn=\"graphical user interface, text, application, email\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller_Updated_Gallery2.jpg\"><img loading=\"lazy\" decoding=\"async\" width=\"965\" height=\"852\" data-id=\"722890\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller_Updated_Gallery2.jpg\" alt=\"graphical user interface, text, application, email\" class=\"wp-image-722890\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller_Updated_Gallery2.jpg 965w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller_Updated_Gallery2-300x265.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller_Updated_Gallery2-768x678.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller_Updated_Gallery2-14x12.jpg 14w\" sizes=\"auto, (max-width: 965px) 100vw, 965px\" \/><\/a><figcaption>Query: {\u043b\u0430\u0446\u0456\u043d\u0441\u043a\u0456 \u0430\u043b\u043b\u044c\u0444\u0430\u0431\u044d\u0442} in Belarusian (be), translation {latin alphabet} <br>Should be spelled as {\u043b\u0430\u0446\u0456\u043d\u0441\u043a\u0456 \u0430\u043b\u044c\u0444\u0430\u0431\u044d\u0442}<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><a data-bi-bhvr=\"14\"  data-bi-cn=\"graphical user interface, text, application, email\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller_updated_Gallery3.jpg\"><img loading=\"lazy\" decoding=\"async\" width=\"958\" height=\"879\" data-id=\"722893\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller_updated_Gallery3.jpg\" alt=\"graphical user interface, text, application, email\" class=\"wp-image-722893\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller_updated_Gallery3.jpg 958w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller_updated_Gallery3-300x275.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller_updated_Gallery3-768x705.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller_updated_Gallery3-13x12.jpg 13w\" sizes=\"auto, (max-width: 958px) 100vw, 958px\" \/><\/a><figcaption>Query: {usaqlaarda seker} in Azerbaijani (az), translation {sugar in children} <br>Should be spelled as {usaqlarda seker}<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><a data-bi-bhvr=\"14\"  data-bi-cn=\"graphical user interface, text, application, email\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller_Updated_Gallery4.jpg\"><img loading=\"lazy\" decoding=\"async\" width=\"895\" height=\"888\" data-id=\"722896\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller_Updated_Gallery4.jpg\" alt=\"graphical user interface, text, application, email\" class=\"wp-image-722896\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller_Updated_Gallery4.jpg 895w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller_Updated_Gallery4-300x298.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller_Updated_Gallery4-150x150.jpg 150w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller_Updated_Gallery4-768x762.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller_Updated_Gallery4-12x12.jpg 12w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller_Updated_Gallery4-180x180.jpg 180w\" sizes=\"auto, (max-width: 895px) 100vw, 895px\" \/><\/a><figcaption>Query: {\u0637\u0628\u06cc \u06a9\u062a\u0628\u0648\u0646\u0647} in pashto (ps), translation {Medical books}<br>Should be spelled as {\u0637\u0628\u06cc \u06a9\u062a\u0627\u0628\u0648\u0646\u0647}<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><a data-bi-bhvr=\"14\"  data-bi-cn=\"graphical user interface, text, application, email\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller_Updated_Gallery5.jpg\"><img loading=\"lazy\" decoding=\"async\" width=\"906\" height=\"621\" data-id=\"722899\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller_Updated_Gallery5.jpg\" alt=\"graphical user interface, text, application, email\" class=\"wp-image-722899\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller_Updated_Gallery5.jpg 906w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller_Updated_Gallery5-300x206.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller_Updated_Gallery5-768x526.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller_Updated_Gallery5-16x12.jpg 16w\" sizes=\"auto, (max-width: 906px) 100vw, 906px\" \/><\/a><figcaption>Query: {fh\u00e1zy mesiac} in Slovak (sk),<br>translation: {phases of the moon}<br>Should be spelled as {\u00e1zy mesiaca}<\/figcaption><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><a data-bi-bhvr=\"14\"  data-bi-cn=\"graphical user interface, text, application, email\" href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller_Updated_Gallery6.jpg\"><img loading=\"lazy\" decoding=\"async\" width=\"912\" height=\"821\" data-id=\"722902\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller_Updated_Gallery6.jpg\" alt=\"graphical user interface, text, application, email\" class=\"wp-image-722902\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller_Updated_Gallery6.jpg 912w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller_Updated_Gallery6-300x270.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller_Updated_Gallery6-768x691.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller_Updated_Gallery6-13x12.jpg 13w\" sizes=\"auto, (max-width: 912px) 100vw, 912px\" \/><\/a><figcaption>Query: {istoria rosilor} in Romanian (ro),<br>translation: {History of Russia}<br>Should be spelled as {istoria rusilor}<\/figcaption><\/figure>\n<figcaption class=\"blocks-gallery-caption\">Above are some examples of Bing search results after Speller100 implementation. Speller100 has improved quality in a great many low- and no-resource languages, such as Macedonian, Belarusian, Azerbaijani, Pashto, Slovak, Romanian, and others to bring much better experience to our users. Click on the images above to enlarge.<\/figcaption><\/figure>\n\n\n\n<div style=\"height:15px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>Traditionally, spelling correction solutions have leveraged noisy channel theory and made great improvements in building better statistical error models and language models. Search engines have long used web documents for robust language models. For precise and high-performing error models, search engines have largely leveraged user feedback on autocorrection recourse links. This practice has been very effective, especially for languages where user feedback data has been gathered on a large scale. For a language with very little web presence and user feedback, it\u2019s challenging to gather an adequate amount of training data.<\/p>\n\n\n\n<p>In order to create spelling correction solutions for these latter types of languages, models cannot rely solely on training data to learn the spelling of a language. The foundation of Speller100 is based on the concept of language families\u2014for our purposes, larger groups of languages based on similarities that multiple languages share. Another concept, zero-shot learning, allows a model to accurately learn and correct spelling without any additional language-specific labeled training data. Imagine someone had taught you how to spell in English and you automatically learned to also spell in German, Dutch, Afrikaans, Scots, and Luxembourgish. <em>That <\/em>is what zero-shot learning enables, and it is a key component in Speller100 that allows us to expand to languages with very little to no data.<\/p>\n\n\n\n<h2 id=\"unlocking-the-power-of-task-driven-pretraining\">Unlocking the power of task-driven pretraining<\/h2>\n\n\n\n<p>We\u2019ve seen significant advancements in natural language processing (NLP) in the last year through large Transformer networks like BERT, UniLM, and DeBERTa. These models are trained with tasks like Masked Language Model (MLM), next-sentence prediction, and translation. Even though commonly used WordPiece or SentencePiece subword segmentation algorithms break down words into smaller constituents, existing pretraining tasks all operate at the word, phrase, or even sentence level for semantic understanding. Spelling, however, is a different task altogether.<\/p>\n\n\n\n<p>Broadly speaking, there are two types of spelling errors. One is non-word error, and the other is real-word error. Non-word error occurs when a word is not in the vocabulary for a given language at all; real-word error occurs when the word itself is valid but doesn\u2019t fit in the larger context. Both errors are character-level mutations within reasonable edit distance to the desired words.<\/p>\n\n\n\n<p>At the core, spelling correction is about building an error model and a language model. The MLM task makes very good language models, even for those languages with very little web presence. However, we haven\u2019t seen much innovation on the error model for pretraining tasks. For large-scale language family\u2013based multilingual spelling correction, we designed a spelling correction pretraining task to enrich standard Transformer-based models.<\/p>\n\n\n\n<p>Spelling correction is a sequence-to-sequence (s2s) problem that converts a text with typos into its correct form. Also, if typos are considered noises in text, spelling correction can be considered as a denoising process that converts corrupted text into its original text. Deep learning is the state-of-the-art technology used for s2s applications.<\/p>\n\n\n\n<p>Our deep learning approach is inspired by Facebook AI Research\u2019s <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/arxiv.org\/pdf\/1910.13461.pdf\"><strong>BART<\/strong><span class=\"sr-only\"> (opens in new tab)<\/span><\/a>, a word-level denoising s2s autoencoder pretraining for natural language generation (NLG), translation, and comprehension. BART is trained by corrupting text with an arbitrary noise function and learning a model to reconstruct the original text. Our model differs from BART in that we frame spelling correction as a character-level s2s denoising autoencoder problem and build out pretraining data with character-level mutations in order to mimic spelling errors. We have designed noise functions to generate common errors of rotation, insertion, deletion, and replacement. See the figure below for examples of these common errors.<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"624\" height=\"79\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller100-Figure1.png\" alt=\"Example corrections such as rotation, insertion, deletion and replacement\" class=\"wp-image-722752\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller100-Figure1.png 624w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller100-Figure1-300x38.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/Speller100-Figure1-16x2.png 16w\" sizes=\"auto, (max-width: 624px) 100vw, 624px\" \/><\/figure><\/div>\n\n\n\n<p>The use of a noise function significantly reduced our demand on human-labeled annotations, which are often required in machine learning. This is quite useful for languages for which we have little or no training data. With a noise function, we can obtain a pretrained model (see figure below), and then fine-tuning the model becomes zero-shot or few-shot learning scenarios for those languages. <\/p>\n\n\n\n<p>Thanks to noise functions, we no longer need a large corpus of misspelled queries and can make do with regular text extracted from web pages. This text can easily be extracted through web crawling, and there is a sufficient amount of text for the training of hundreds of languages. It then becomes practical to build a speller using a deep-based s2s model for these languages.<\/p>\n\n\n\n<p>Here is the model architecture:<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-large\"><a href=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/1400x788_Speller100_Still_noLogo_light_background-scaled.jpg\"><img decoding=\"async\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/1400x788_Speller100_Still_noLogo_light_background-scaled.jpg\" alt=\"Pretraining process. The user query in the bottom row shows a query with a misspelling error. \"\/><\/a><figcaption><strong>Pretraining process.<\/strong> The user query in the bottom row shows a query with a misspelling error. The model takes the input and does subword segmentation, and then it sends the segmented input through a sequence-to-sequence Transformer encoder and decoder. We convert the segmented (tokenized) subword back to word form in the final correction stage.<\/figcaption><\/figure><\/div>\n\n\n\n<p>This pretraining task proves to be a first solid step to solve multilingual spelling correction for 100-plus languages. It helps to reach 50% of correction recall for top candidates in languages for which we have zero training data.<\/p>\n\n\n\n<h2 id=\"utilizing-a-languages-family-for-efficient-and-effective-zero-shot-learning\">Utilizing a language\u2019s family for efficient and effective zero-shot learning<\/h2>\n\n\n\n<p>50% of recall is obviously not good enough for a production system. In the case of Bing, where roughly 15% of queries are misspelled, that would mean that 7.5% of all queries would not have proper spelling correction. For languages with zero training data, our next design proves to be crucial too. We tapped into the zero-shot learning property of deep models effectively and efficiently by producing models to target language families.<\/p>\n\n\n\n<p>It\u2019s well known in the historical linguistics world that languages are rarely isolated. Most of the world\u2019s languages are known to be related to others. A group of languages descended from the same ancestor form a <em>language family<\/em>. They share a lot in <em>orthography<\/em>\u2014the spelling and other written conventions of a language\u2014which stems from morphological and phonetical similarities.<\/p>\n\n\n\n<p>Below is an illustration of orthographic similarities between languages in the <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/en.wikipedia.org\/wiki\/Germanic_languages\">Germanic languages<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><thead><tr><th><strong>English<\/strong><\/th><th><strong>Dutch<\/strong><\/th><th><strong>Afrikaans<\/strong><\/th><th><strong>German<\/strong><\/th><th><strong>Luxembourgish<\/strong><\/th><\/tr><\/thead><tbody><tr><td>two <\/td><td>twee<\/td><td>twee<\/td><td>zwei<\/td><td>zwee<\/td><\/tr><tr><td>blood<\/td><td>bloed<\/td><td>bloed<\/td><td>Blut<\/td><td>Blutt<\/td><\/tr><tr><td>finger<\/td><td>vinger<\/td><td>vinger<\/td><td>Finger<\/td><td>Fanger<\/td><\/tr><tr><td>download<\/td><td>downloaden<\/td><td>aflaai<\/td><td>herunterladen<\/td><td>eroflueden<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>This orthographic, morphological, and semantic similarity between languages in the same group makes a zero-shot learning error model very efficient and effective. High-quality error model training data is abundant in high-resource languages, like English and German in the Germanic language family; we also have a reasonable amount of data in Dutch; however, in the same language family, we have a severe shortage of training data in Afrikaans or Luxembourgish. Zero-shot learning makes learning spelling prediction for these low-resource or no-resource languages possible. We simply build a dozen or so language family\u2013based models to maximize the zero-shot benefit and keep the model compact enough for runtime. This proves to be optimal for both relevance and engineering.<\/p>\n\n\n\n<h2 id=\"the-user-experience-impact-of-speller100\">The user experience impact of Speller100<\/h2>\n\n\n\n<p>We believe Speller100 is the most comprehensive spelling correction system ever made in terms of language coverage and accuracy. With this technology, we have improved the search results for all Bing users by expanding accurate spelling correction to over 100 languages. We have observed a double-digit improvement in both spelling correction precision and recall. After conducting Bing online A\/B testing, here are the results:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>The number of pages with no results reduced by up to 30%.<\/li><li>The number of times users had to manually reformulate their query reduced by 5%.<\/li><li>The number of times users clicked on our spelling suggestion increased from single digits to 67%.<\/li><li>The number of times users clicked on any item on the page went from single digits to 70%.<\/li><\/ul>\n\n\n\n<p>These are great indications that we have made our users\u2019 experience better! Shipping Speller100 to Bing is obviously just the first step. We hope to implement this technology in many more Microsoft products soon.<\/p>\n\n\n\n<p>If you are interested in applying the latest deep learning techniques to innovate in search, <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"https:\/\/careers.microsoft.com\/us\/en\/search-results?keywords=%23semanticsearch%23\">our Search and AI team is hiring globally<span class=\"sr-only\"> (opens in new tab)<\/span><\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>At Microsoft Bing, our mission is to delight users everywhere with the best search experience. We serve a diverse set of customers all over the planet who issue queries in over 100 languages. In search we\u2019ve found about 15% of queries submitted by customers have misspellings. When queries are misspelled, we match the wrong set [&hellip;]<\/p>\n","protected":false},"author":38838,"featured_media":724036,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[{"type":"user_nicename","value":"Jingwen Lu","user_id":"40021"},{"type":"user_nicename","value":"Jidong Long (\u9f99\u7ee7\u4e1c)","user_id":"40027"},{"type":"user_nicename","value":"Rangan Majumder","user_id":"38931"}],"msr_hide_image_in_river":0,"footnotes":""},"categories":[1],"tags":[],"research-area":[13556,13545],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-722737","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-artificial-intelligence","msr-research-area-human-language-technologies","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[649749,715045],"related-events":[],"related-researchers":[{"type":"user_nicename","value":"Jingwen Lu","user_id":40021,"display_name":"Jingwen Lu","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinlu\/\" aria-label=\"Visit the profile page for Jingwen Lu\">Jingwen Lu<\/a>","is_active":false,"last_first":"Lu, Jingwen","people_section":0,"alias":"jinlu"},{"type":"user_nicename","value":"Jidong Long (\u9f99\u7ee7\u4e1c)","user_id":40027,"display_name":"Jidong Long (\u9f99\u7ee7\u4e1c)","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jilong\/\" aria-label=\"Visit the profile page for Jidong Long (\u9f99\u7ee7\u4e1c)\">Jidong Long (\u9f99\u7ee7\u4e1c)<\/a>","is_active":false,"last_first":"Long (\u9f99\u7ee7\u4e1c), Jidong","people_section":0,"alias":"jilong"},{"type":"user_nicename","value":"Rangan Majumder","user_id":38931,"display_name":"Rangan Majumder","author_link":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/ranganm\/\" aria-label=\"Visit the profile page for Rangan Majumder\">Rangan Majumder<\/a>","is_active":false,"last_first":"Majumder, Rangan","people_section":0,"alias":"ranganm"}],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/1400x788_Speller100_Still_noLogo-960x540.jpg\" class=\"img-object-cover\" alt=\"Diagram shows Model Architecture of Microsoft Vision Model ResNet-50\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/1400x788_Speller100_Still_noLogo-960x540.jpg 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/1400x788_Speller100_Still_noLogo-300x169.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/1400x788_Speller100_Still_noLogo-1024x577.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/1400x788_Speller100_Still_noLogo-768x432.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/1400x788_Speller100_Still_noLogo-1536x865.jpg 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/1400x788_Speller100_Still_noLogo-2048x1153.jpg 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/1400x788_Speller100_Still_noLogo-16x9.jpg 16w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/1400x788_Speller100_Still_noLogo-1066x600.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/1400x788_Speller100_Still_noLogo-655x368.jpg 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/1400x788_Speller100_Still_noLogo-343x193.jpg 343w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/1400x788_Speller100_Still_noLogo-640x360.jpg 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/1400x788_Speller100_Still_noLogo-1280x720.jpg 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2021\/02\/1400x788_Speller100_Still_noLogo-1920x1080.jpg 1920w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jinlu\/\" title=\"Go to researcher profile for Jingwen Lu\" aria-label=\"Go to researcher profile for Jingwen Lu\" data-bi-type=\"byline author\" data-bi-cN=\"Jingwen Lu\">Jingwen Lu<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/jilong\/\" title=\"Go to researcher profile for Jidong Long (\u9f99\u7ee7\u4e1c)\" aria-label=\"Go to researcher profile for Jidong Long (\u9f99\u7ee7\u4e1c)\" data-bi-type=\"byline author\" data-bi-cN=\"Jidong Long (\u9f99\u7ee7\u4e1c)\">Jidong Long (\u9f99\u7ee7\u4e1c)<\/a>, and <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/people\/ranganm\/\" title=\"Go to researcher profile for Rangan Majumder\" aria-label=\"Go to researcher profile for Rangan Majumder\" data-bi-type=\"byline author\" data-bi-cN=\"Rangan Majumder\">Rangan Majumder<\/a>","formattedDate":"February 8, 2021","formattedExcerpt":"At Microsoft Bing, our mission is to delight users everywhere with the best search experience. We serve a diverse set of customers all over the planet who issue queries in over 100 languages. In search we\u2019ve found about 15% of queries submitted by customers have&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/722737","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/38838"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=722737"}],"version-history":[{"count":31,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/722737\/revisions"}],"predecessor-version":[{"id":840235,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/722737\/revisions\/840235"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/724036"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=722737"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=722737"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=722737"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=722737"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=722737"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=722737"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=722737"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=722737"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=722737"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=722737"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=722737"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}