We present a method for conditional maximum likelihood estimation
of N-gram models used for text or speech utterance classification.
The method employs a well known technique relying
on a generalization of the Baum-Eagon inequality from polynomials
to rational functions. The best performance is achieved
for the 1-gram classifier where conditional maximum likelihood
training reduces the class error rate over a maximum likelihood
classifier by 45% relative.