Optimal Multi-Objective Best Arm Identification with Fixed Confidence
Optimal Multi-Objective Best Arm Identification with Fixed Confidence
We consider a multi-armed bandit setting with finitely many arms, in which each arm yields an $M$-dimensional vector reward upon selection. We assume that the reward of each dimension (a.k.a. {\em objective}) is generated independently of the others. The best arm of any given objective is the arm with the …