Empirical Likelihood for Contextual Bandits

Nikos Karampatziakis; John Langford; Paul Mineiro

Empirical Likelihood for Contextual Bandits

Nikos Karampatziakis ,
John Langford ,
Paul Mineiro

September 2020

arXiv:1906.03323v3

Download BibTex

We apply empirical likelihood techniques to contextual bandit policy value estimation, confidence intervals, and learning. We propose a tighter estimator for off-policy evaluation with improved statistical performance over previous proposals. Coupled with this estimator is a confidence interval which also improves over previous proposals. We then harness these to improve learning from contextual bandit data. Each of these is empirically evaluated to show good performance against strong baselines in finite sample regimes.