Ask a Question

Prefer a chat interface with context about you and your work?

Almost Minimax Optimal Best Arm Identification in Piecewise Stationary Linear Bandits

Almost Minimax Optimal Best Arm Identification in Piecewise Stationary Linear Bandits

We propose a {\em novel} piecewise stationary linear bandit (PSLB) model, where the environment randomly samples a context from an unknown probability distribution at each changepoint, and the quality of an arm is measured by its return averaged over all contexts. The contexts and their distribution, as well as the …