On Regret-Optimal Learning in Decentralized Multiplayer Multiarmed Bandits
On Regret-Optimal Learning in Decentralized Multiplayer Multiarmed Bandits
We consider the problem of learning in single-player and multiplayer multiarmed bandit models. Bandit problems are classes of online learning problems that capture exploration versus exploitation tradeoffs. In a multiarmed bandit model, players can pick among many arms, and each play of an arm generates an i.i.d. reward from an …