The Microsoft Research ESL Assistant is a web service that provides correction suggestions for typical ESL (English as a Second Language) errors. Such errors include, for example, the choice of determiners (the/a) and the choice of prepositions. The web service also provides word choice suggestions from a thesaurus. In order to help the user make decisions on whether to accept a suggestion, the service displays “before and after” web search results so that the user can see real-life examples of the usage of both their original input and the suggested correction.
The web service is currently no longer live, but the error detection and correction components are fully functional and can be revived at any time.
News and updates will be provided on our team blog site on MSDN.
The Web UI
The ESL Assistant has a user interface that uses the Microsoft Silverlight™ browser plug-in. Text to be checked is entered in the box at the top. When the user clicks the check button, potential errors are identified and highlighted, and the user can proceed from one highlighted segment to the next, reviewing the possible errors and suggested corrections presented in the box below. A pie chart allows the user to compare the approximate frequency distributions of their own input and the suggestions, as indexed by the Microsoft Bing™ decision engine. When the user hovers their mouse over any of the options available, a dropdown panel shows selected usage examples found on the Web. Users can explore additional examples by clicking the link at the bottom of the dropdown panel.
The basic architecture of our system consists of three parts: a set of modules that identify possible corrections, a large language model that evaluates the possible suggestions, and a module that produces search results using Live Search. The individual error modules target specific errors each, and some of these models are based on heuristics, while others use machine learned classifiers. Information that the modules take into account includes the presence of specific words as well as the sequence of part-of-speech tags that are automatically assigned. The language model is trained on the Gigaword corpus, a very large collection of text, and serves as a filter on the suggested corrections: only suggestions that produce a significantly higher language model score than the original user input will be shown to the user.