Almost Minimax Optimal Best Arm Identification in Piecewise Stationary
Linear Bandits
Almost Minimax Optimal Best Arm Identification in Piecewise Stationary
Linear Bandits
We propose a {\em novel} piecewise stationary linear bandit (PSLB) model, where the environment randomly samples a context from an unknown probability distribution at each changepoint, and the quality of an arm is measured by its return averaged over all contexts. The contexts and their distribution, as well as the …