Using Asymmetric Distributions to Improve Classiï¬er Probabilities: A Comparison of New and Standard Parametric Methods
- Paul N. Bennett
CMU-CS-02-126 |
Computer Science Department, School of Computer Science, Carnegie Mellon University (See errata at http://www.cs.cmu.edu/~pbennett/papers/errata-for-asymmetric.html . Also a revised version of this work appears in SIGIR 2003.)
For many discriminative classifiers, it is desirable to convert an unnormalized confidence score output from the classifier to a normalized probability estimate. Such a method can also be used for creating better estimates from a probabilistic classifier that outputs poor estimates. Typical parametric methods have an underlying assumption that the score distribution for a class is symmetric; we motivate why this assumption is undesirable, especially when the scores are output by a classifier. Two asymmetric families, an asymmetric generalization of a Gaussian and a Laplace distribution, are presented, and a method of fitting them in expected linear time is described. Finally, an experimental analysis of parametric fits to the outputs of two text classifiers, naive Bayes (which is known to emit poor probabilities) and a linear SVM, is conducted. The analysis shows that one of these asymmetric families is theoretically attractive (introducing few new parameters while increasing flexibility), computationally efficient, and empirically preferable.