Spelling correction in pinyin input

  • Kai-Fu Lee ,
  • Zheng Chen

Chinese input method is one of the difficult problem of Chinese Language Processing. Base on sentence-based pinyin input method, we propose a new typing model to solve this problem. After analyze the most popular errors which made by typists, we build a typing model. The typing model is trained on real data, and learns probabilities of typing errors. We also train a powerful Chinese language model based on large corpus. In the Pinyin-to-Hanzi conversion, the probabilities of typing model are combined with the language model probabilities, to find the most probable interpretation of a sequence of Roman letters typed. Further more, spelling correction can automatically adapt to typist, and it is applicable to any language.