In Search of Britney…or Brittany…or Brittannie
Ever mistype your query in a search engine? Or just flat out misspell it? Of course you have—we all do, especially when our search involves “spelling demons” like minuscule, millennium, or embarrassment. Or personal names: believe it or not, there are more than 500 ways that Britney Spears has been misspelled on the web. Misspellings and typos make it difficult for search engines to give users the best results.
Better spelling algorithms can get users to the information they seek, without their having to carry around a dictionary or scroll through several pages of results. Quality spelling algorithms become even more relevant when the searcher is using a smartphone, as it is difficult to browse through page after page of results on those tinier screens.
With this in mind, Microsoft Research and Microsoft Bing launched the Speller Challenge, encouraging participants worldwide to compete in creating a spelling algorithm that generates the most plausible alternatives for web search queries. Participants were able to access real-world data at web scale by using the Microsoft Research Web N-gram Services. Moreover, participants were able to improve their algorithm and see how it compared to other spelling correction systems by using an evaluation service that we made available to them.
More than 300 participants registered for the Speller Challenge, representing every continent (well almost; no one actually registered from Antarctica) and including researchers from academia, research laboratories, and industry. Winners were automatically selected, based on how well their system performed with respect to figuring out the best spelling alternatives (for example, “Britney Spears” for “briteny spears”). On Tuesday, July 19, we hosted a workshop at Bing headquarters, where Harry Shum, corporate vice president of Bing, presented the winners their prizes. Congratulations to everyone who took part in the program:
- First place (US$10,000): Gord Lueck – Canada
- Second place (US$8,000): Yanen Li, Huizhong Duan, and ChengXiang Zhai – United States
- Third place (US$6,000): Yasser Ganjisaffar, Andrea Zilio, Sara Javanmardi, Inci Cetindil, Manik Sikka, Sandeep P. Katumalla, Narges Khatib, and Chen Li – United States
- Fourth place (US$4,000): Dan Ştefănescu, Radu Ion, and Tiberiu Boroş – Romania
- Fifth place (US$2,000): Yoh Okuno – Japan
Finally, here are a few remarks from first-place winner Gord Lueck:
“Microsoft has been a leader in offering visibility into search data for research purposes. Big data is the driver of many of the tools that make the Internet useful. Through Microsoft, some of that data is now available to the community at large to build up and design algorithms with. It’s this generosity and openness that has allowed many independent researchers, such as myself, to design a high quality software product that leverages these valuable data.
“A very good quality dataset for training was given to the researchers, providing a benchmark against which to compare their work in near real-time against other researchers in the same field. This quick feedback cycle undoubtedly helps to accelerate the pace of research beyond that which might have occurred in an environment where data and methods are hoarded and protected.”
Gord also noted that the competition focused on U.S. English spellings, pointing out that “it would have been nice to see some more variety in input languages and grammars.” Sounds like an idea for another contest!
—Evelyne Viegas, Director of Semantic Computing, Microsoft Research Connections
- Building a Better Speller Blog
- Building a Better Speller Video
- Speller Challenge Winners
- Spelling Alteration for Web Search Workshop
- Microsoft Web N-gram Services
- Microsoft Research Connections