Statistical language models have been successfully applied to a lot of problems, including speech recognition, handwriting, Chinese pinyin-input etc. In recognition, statistical language model, such as trigram, is used to provide adequate information to predict the probabilities of hypothesized word sequences. The traditional method relying on distribution estimation are sub-optimal when the assumed distribution form is not the true one, and that “optimality” in distribution estimation does not automatically translate into “optimality” in classifier design. This paper proposed a discriminative training method to minimize the error rate of recognizer rather than estimate the distribution of training data. Furthermore, lexicon is also optimized to minimize the error rate of the decoder through discriminative training. Compared to the traditional LM building method, our systems gets approximately 5%-25% recognition error reduction with discriminative training on language model building.