The pure exploration problem with general reward functions depending on full distributions

Machine Learning |

In this paper, we study the pure exploration model with general distribution functions, which means that the reward function of each arm depends on the whole distribution, not only its mean. We adapt the elimination framework and the LUCB framework to solve this problem, and design algorithms for estimating the value of the reward functions with different types of distributions. Then we show that our estimation methods have the correctness guarantee with proper parameters, and obtain sample complexity upper bounds for them. The order of the complexity upper bounds matches the complexity lower bound when the error constraint δ">δ converges to 0, which implies that our algorithms are asymptotically optimal. Finally, we discuss about some important applications and their corresponding solutions under our learning framework, and use some experiments to demonstrate the effectiveness of our algorithms. The experimental results show that our algorithms outperform existing solutions.