Be Greedy: How Chromatic Number meets Regret Minimization in Graph Bandits.

  • Aadirupa Saha ,
  • Shreyas Sheshadri ,
  • Chiranjib Bhattacharyya

Uncertainty in Artificial Intelligence |

Publication | Publication

We study the classical linear bandit problem on \emph{graphs} modeling arm rewards through an underlying graph structure $G$($V$,$E$) such that rewards of neighboring nodes are similar. Previous attempts along this line have primarily considered the arm rewards to be a smooth function over graph Laplacian, which however failed to characterize the inherent problem complexity in terms of the graph structure.%, where lies the primary motivation of this work. We bridge this gap by showing a regret guarantee of $\tO(\chi(\overline{G})\sqrt{T})$ \footnote{$\tO(\cdot)$ notation hides dependencies on $\log T$.} that scales only with the chromatic number of the complement graph $\chi(\overline{G})$, assuming the rewards to be a smooth function over a general class of graph embeddings—\emph{Orthonormal Representations}. Our proposed algorithms yield a regret guarantee of $\tilde O(r\sqrt T)$ for any general embedding of rank $r$. Moreover, if the rewards correspond to a minimum rank embedding, the regret boils down to $\tO(\chi(\overline{G})\sqrt{T})$–none of the existing works were able to bring out such influences of graph structures over arm rewards. Finally, noting that computing the above minimum rank embedding is NP-Hard, we also propose an alternative $O(|V| + |E|)$ time computable embedding scheme—{\it Greedy Embeddings}—based on greedy graph coloring, with which our algorithms perform optimally on a large family of graphs, e.g. \hspace{-8pt} union of cliques, complement of $k$-colorable graphs, regular graphs, trees etc, and are also shown to outperform state-of-the-art methods on real datasets. Our findings open up new roads for exploiting graph structures on regret performance.