A Non-asymptotic Approach to Best-Arm Identification for Gaussian Bandits
A Non-asymptotic Approach to Best-Arm Identification for Gaussian Bandits
We propose a new strategy for best-arm identification with fixed confidence of Gaussian variables with bounded means and unit variance. This strategy called Exploration-Biased Sampling is not only asymptotically optimal: we also prove non-asymptotic bounds occurring with high probability. To the best of our knowledge, this is the first strategy …