Ask a Question

Prefer a chat interface with context about you and your work?

On Regret-Optimal Learning in Decentralized Multiplayer Multiarmed Bandits

On Regret-Optimal Learning in Decentralized Multiplayer Multiarmed Bandits

We consider the problem of learning in single-player and multiplayer multiarmed bandit models. Bandit problems are classes of online learning problems that capture exploration versus exploitation tradeoffs. In a multiarmed bandit model, players can pick among many arms, and each play of an arm generates an i.i.d. reward from an …