Assessing the Calibration of Naive Bayes’ Posterior Estimates

  • Paul N. Bennett

CMU-CS-00-155 |

Computer Science Department, School of Computer Science, Carnegie Mellon University

In this paper, we give evidence that the posterior distribution of Naive Bayes goes to zero or one exponentially with document length. While exponential change may be expected as new bits of information are added, adding new words does not always correspond to new information. Essentially as a result of its independence assumption, the estimates grow too quickly. We investigate one parametric family that attempts to downweight the growth rate. The parameters of this family are estimated using a maximum likelihood scheme, and the results are evaluated.