Contextual Combinatorial Bandits with Probabilistically Triggered Arms

Proceedings of the 40th International Conference on Machine Learning (ICML) |

Preprint

We study contextual combinatorial bandits with probabilistically triggered arms (C\(^2\)MAB-T) under a variety of smoothness conditions that capture a wide range of applications, such as contextual cascading bandits and contextual influence maximization bandits. Under the triggering probability modulated (TPM) condition, we devise the C\(^2\)-UCB-T algorithm and propose a novel analysis that achieves an \(\tilde{O}(d\sqrt{KT})\) regret bound, removing a potentially exponentially large factor \(O(1/p_{min})\), where \(d\) is the dimension of contexts, \(p_{min}\) is the minimum positive probability that any arm can be triggered, and batch-size \(K\) is the maximum number of arms that can be triggered per round. Under the variance modulated (VM) or triggering probability and variance modulated (TPVM) conditions, we propose a new variance-adaptive algorithm VAC\(^2\)-UCB and derive a regret bound \(\tilde{O}(d\sqrt{T})\), which is independent of the batch-size \(K\). As a valuable by-product, we find our analysis technique and variance-adaptive algorithm can be applied to the CMAB-T and C\(^2\)MAB setting, improving existing results there as well. We also include experiments that demonstrate the improved performance of our algorithms compared with benchmark algorithms on synthetic and real-world datasets.