{"id":2963,"date":"2010-12-15T14:30:00","date_gmt":"2010-12-15T14:30:00","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/msr_er\/2010\/12\/15\/building-a-better-speller-bing-and-microsoft-research-offer-prizes-for-best-search-engine-spelling-alteration-services\/"},"modified":"2016-07-20T07:34:15","modified_gmt":"2016-07-20T14:34:15","slug":"building-a-better-speller-bing-and-microsoft-research-offer-prizes-for-best-search-engine-spelling-alteration-services","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/building-a-better-speller-bing-and-microsoft-research-offer-prizes-for-best-search-engine-spelling-alteration-services\/","title":{"rendered":"Building a Better Speller: Bing and Microsoft Research Offer Prizes for Best Search Engine Spelling Alteration Services"},"content":{"rendered":"<p style=\"text-align: center;\"><span style=\"font-family: verdana,geneva;\"><span style=\"font-size: medium;\"><img decoding=\"async\" src=\"https:\/\/msdnshared.blob.core.windows.net\/media\/MSDNBlogsFS\/prod.evol.blogs.msdn.com\/CommunityServer.Blogs.Components.WeblogFiles\/00\/00\/01\/32\/81\/3324.SpellerChallengeBanner_blog.jpg\" original-url=\"http:\/\/blogs.msdn.com\/resized-image.ashx\/__size\/496x115\/__key\/CommunityServer-Blogs-Components-WeblogFiles\/00-00-01-32-81\/3324.SpellerChallengeBanner_5F00_blog.jpg\" alt=\"Speller Challenge, presented by Microsoft Research in partnership with Bing\" title=\"Speller Challenge, presented by Microsoft Research in partnership with Bing\" style=\"border: 0px;\" \/><\/span><\/span><\/p>\n<p><span style=\"font-family: verdana,geneva;\"><span style=\"font-size: medium;\">When you type a word or phrase into a search engine, there are a number of things that could go wrong. You might not know how a term is spelled or, in your rush to jump to the results, you could transpose or otherwise mistype some characters. <\/span><\/span><\/p>\n<p><span style=\"font-family: verdana,geneva;\"><span style=\"font-size: medium;\">Spelling alteration is a popular search technique used to translate apparent typographical errors, alternative spellings, and synonyms into an improved query that returns the best possible results on the first try. <\/span><\/span><\/p>\n<p><span style=\"font-family: verdana,geneva;\"><span style=\"font-size: medium;\">But this approach is not without its pitfalls. You might enter a word correctly that&#8217;s not widely used but has a neighbor in the dictionary that&#8217;s much more popular on the Internet. One person&#8217;s spelling error could be another&#8217;s perfect query. Which results should the search engine provide, and how should any useful alternative searches be represented?&nbsp; <\/span><\/span><\/p>\n<p><span style=\"font-family: verdana,geneva;\"><span style=\"font-size: medium;\">That&#8217;s the task being offered to researchers and students around the world in the <\/span><\/span><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/spellerchallenge.com\/\"><span style=\"font-family: verdana,geneva;\"><span style=\"font-size: medium;\">Speller Challenge<\/span><\/span><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><span style=\"font-family: verdana,geneva;\"><span style=\"font-size: medium;\">, presented by Microsoft Research in partnership with Bing. The goal is to develop a spelling alteration system suitable for large-scale statistical data mining-based web search.<\/span><\/span><\/p>\n<p><span style=\"font-family: verdana,geneva;\"><span style=\"font-size: medium;\">A common approach to spelling alteration is the noisy channel model, in which the received query (<i>q<\/i>) is treated as a noise-corrupted version of the target query (<i>c<\/i>). In this model, the spelling alteration system alters <i>q<\/i> into <i>c<\/i> and returns the latter&#8217;s results. The techniques to best identify query\/target pairs and best estimate these statistics are the active research problem that underlies this challenge.<\/span><\/span><\/p>\n<p><span style=\"font-family: verdana,geneva;\"><span style=\"font-size: medium;\">But that&#8217;s just the foundation. Place the spelling alteration task in the context of web search, and you have another dimension to consider. For a lot of spelling applications, target queries are assumed to be composed of tokens (i.e., words and phrases) that are drawn from a predetermined vocabulary. The effectiveness of using a fixed lexicon is a known problem because it can lead the speller not only to miss &#8220;real word&#8221; errors but also misrecognize out-of-vocabulary tokens as errors. <\/span><\/span><\/p>\n<p><span style=\"font-family: verdana,geneva;\"><span style=\"font-size: medium;\">In the context of search, the scale of the web magnifies this problem considerably. The challenge is therefore not necessarily to alter queries to conform to a specific dictionary of words and phrases, but rather provide relevant documents that have high matching scores in ranking. <\/span><\/span><\/p>\n<p><span style=\"font-family: verdana,geneva;\"><span style=\"font-size: medium;\">If this sounds like the type of problem you (or the search developer in your life) would enjoy solving, the task is to build the best speller web service that proposes the most plausible spelling alternatives for a wide range of search queries. Spellers are encouraged to take advantage of cloud computing and must be submitted to the challenge in the form of REST (Representational State Transfer) web services. <\/span><\/span><\/p>\n<p><span style=\"font-family: verdana,geneva;\"><span style=\"font-size: medium;\">For the purpose of the Speller Challenge, a development dataset (derived from the publicly available TREC queries that are based on the 2008 Million Query Track) will be made available to the public through the Microsoft Web N-gram service. This TREC Evaluation Dataset is annotated by using the same guidelines and processes as in the creation of the Bing Test Dataset, which is the dataset used to select the winners.<\/span><\/span><\/p>\n<p><span style=\"font-family: verdana,geneva;\"><span style=\"font-size: medium;\">The top five competitors will receive the following prizes:<\/span><\/span><\/p>\n<table cellpadding=\"3\" cellspacing=\"0\" border=\"1\">\n<tbody>\n<tr>\n<td width=\"110\" valign=\"top\"><span style=\"font-size: small;\"><span style=\"font-family: verdana,geneva;\">First place<\/span> <\/span><\/td>\n<td width=\"79\" valign=\"top\"><span style=\"font-size: small;\"><span style=\"font-family: verdana,geneva;\">US$10,000<\/span> <\/span><\/td>\n<\/tr>\n<tr>\n<td width=\"110\" valign=\"top\"><span style=\"font-size: small;\"><span style=\"font-family: verdana,geneva;\">Second place<\/span> <\/span><\/td>\n<td width=\"79\" valign=\"top\"><span style=\"font-size: small;\"><span style=\"font-family: verdana,geneva;\">US$8,000<\/span> <\/span><\/td>\n<\/tr>\n<tr>\n<td width=\"110\" valign=\"top\"><span style=\"font-size: small;\"><span style=\"font-family: verdana,geneva;\">Third place<\/span> <\/span><\/td>\n<td width=\"79\" valign=\"top\"><span style=\"font-size: small;\"><span style=\"font-family: verdana,geneva;\">US$6,000<\/span> <\/span><\/td>\n<\/tr>\n<tr>\n<td width=\"110\" valign=\"top\"><span style=\"font-size: small;\"><span style=\"font-family: verdana,geneva;\">Fourth place<\/span> <\/span><\/td>\n<td width=\"79\" valign=\"top\"><span style=\"font-size: small;\"><span style=\"font-family: verdana,geneva;\">US$4,000<\/span> <\/span><\/td>\n<\/tr>\n<tr>\n<td width=\"110\" valign=\"top\"><span style=\"font-size: small;\"><span style=\"font-family: verdana,geneva;\">Fifth place<\/span> <\/span><\/td>\n<td width=\"79\" valign=\"top\"><span style=\"font-size: small;\"><span style=\"font-family: verdana,geneva;\">US$2,000<\/span> <\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><span style=\"font-family: verdana,geneva;\"><span style=\"font-size: medium;\">&nbsp;<\/span><\/span><i><span style=\"font-family: verdana,geneva;\"><span style=\"font-size: medium;\">&mdash;Evelyne Viegas, Director of Semantic Computing for the External Research division of Microsoft Research<\/span><\/span><\/i><\/p>\n<p><b><span style=\"font-family: verdana,geneva;\"><span style=\"font-size: medium;\">Learn More<\/span><\/span><\/b><\/p>\n<ul>\n<li><span style=\"font-family: verdana,geneva;\"><span style=\"font-size: small;\"><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/spellerchallenge.com\">Visit the Speller Challenge website<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/span><\/span><\/li>\n<li><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/www.facebook.com\/pages\/MS-Web-Ngram\/168487043192270\"><span style=\"font-family: verdana,geneva;\"><span style=\"font-size: small;\">Join the Speller Challenge on Facebook<\/span><\/span><span class=\"sr-only\"> (opens in new tab)<\/span><\/a><span style=\"font-family: verdana,geneva;\"><span style=\"font-size: small;\"> <\/span><\/span><\/li>\n<li><span style=\"font-family: verdana,geneva;\"><span style=\"font-size: small;\"><a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" rel=\"noopener noreferrer\" target=\"_blank\" href=\"http:\/\/x.com\/webngram\">Follow the Speller Challenge on Twitter<span class=\"sr-only\"> (opens in new tab)<\/span><\/a><\/span><\/span><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>When you type a word or phrase into a search engine, there are a number of things that could go wrong. You might not know how a term is spelled or, in your rush to jump to the results, you could transpose or otherwise mistype some characters. Spelling alteration is a popular search technique used [&hellip;]<\/p>\n","protected":false},"author":32627,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[],"msr_hide_image_in_river":0,"footnotes":""},"categories":[1],"tags":[194498,194800,186604,194819,186889,195525,193497,193504,196503,197013,197136,197145,197283,197286,197307,197503,197504,197730],"research-area":[],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-2963","post","type-post","status-publish","format-standard","hentry","category-research-blog","tag-2008-million-query-track","tag-best-speller-web-service","tag-bing","tag-bing-test-dataset","tag-cloud-computing","tag-evelyne-viegas","tag-external-research","tag-microsoft-research","tag-microsoft-web-n-gram","tag-rest-representational-state-transfer-web-services","tag-search-technique","tag-semantic-computing","tag-speller-challenge","tag-spelling-alteration-system","tag-statistical-data-mining-based-web-search","tag-trec-evaluation-dataset","tag-trec-queries","tag-vocabulary","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[],"related-events":[],"related-researchers":[],"msr_type":"Post","byline":"","formattedDate":"December 15, 2010","formattedExcerpt":"When you type a word or phrase into a search engine, there are a number of things that could go wrong. You might not know how a term is spelled or, in your rush to jump to the results, you could transpose or otherwise mistype&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/2963","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/32627"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=2963"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/2963\/revisions"}],"predecessor-version":[{"id":262473,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/2963\/revisions\/262473"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=2963"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=2963"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=2963"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=2963"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=2963"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=2963"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=2963"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=2963"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=2963"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=2963"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=2963"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}