Game of Thrones: Fully Distributed Learning for Multiplayer Bandits
Game of Thrones: Fully Distributed Learning for Multiplayer Bandits
We consider an N-player multi-armed bandit game where each player chooses one out of M arms for T turns. Each player has different expected rewards for the arms, and the instantaneous rewards are independent and identically distributed or Markovian. When two or more players choose the same arm, they all …