This paper addresses the problem of discriminative training of language models that does not require any transcribed acoustic data. We propose to minimize the conditional entropy of word sequences given phone sequences, and present two settings in which this criterion can be applied. In an inductive learning setting, the phonetic/acoustic confusability information is given by a general phone error model. A transductive approach, in contrast, obtains that information by running a speech recognizer on test-set acoustics, with the goal of optimizing the test-set performance. Experiments show significant recognition accuracy improvements in both rescoring and first-pass decoding experiments using the transductive approach, and mixed results using the inductive approach.