Optimal Classification with Multivariate Losses

  • Nagarajan Natarajan ,
  • Oluwasanmi Koyejo ,
  • Pradeep Ravikumar ,
  • Inderjit Dhillon

International Conference in Machine Learning |

Multivariate loss functions are extensively employed in several prediction tasks
arising in Information Retrieval. Often, the goal in the tasks is to minimize expected loss
when retrieving relevant items from a presented set of items, where the expectation is with
respect to the joint distribution over item sets. Our key result is that for most multivariate
losses, the expected loss is provably optimized by sorting the items by the conditional
probability of label being positive and then selecting top k items.  Such a result was previously known only for the F-measure. Leveraging on the optimality characterization, we give an algorithm for estimating optimal predictions in practice with runtime quadratic in size of item sets for many losses. We provide empirical results on benchmark datasets, comparing the proposed algorithm to state-of-the-art methods for optimizing multivariate losses.