Battle of Bandits

  • Aadirupa Saha ,
  • Aditya Gopalan

Uncertainty in Artificial Intelligence |

Publication | Publication

We introduce Battling-bandits – an online learning framework where the learner chooses a fixed k size subset from n arms to be compared in each trial and observes a single stochastic feedback indicating the winner of the chosen set of arms. This framework generalizes the standard dueling bandit framework and applies to several practical situations such as medical treatment preference, customer preference of products etc., where it is easier and more efficient to provide feedback for multiple arms at once. We develop a novel class of stochastic subset choice model, called pairwise-subset choice model, for the purpose and propose three algorithms – Battling-Doubler, Battling-MultiSBM and Battling-Duel for the problem. While the first two algorithms are based on the corresponding dueling bandits algorithms designed for linear-link based subset choice models; the third algorithm is much general and works for any it pairwise-subset choice model that has a Condorcet winner. We provide regret guarantees for all the three algorithms and also prove a matching lower bound of $\Omega(n \log T)$ for the Battling problem, showing optimality of our proposed algorithms. The efficacy of the algorithms are also demonstrated through extensive experimental evaluations on a variety of synthetic and real world datasets.