Distributed Learning in Multi-Armed Bandit With Multiple Players

Keqin Liu, Qing Zhao

Type: Article

Publication Date: 2010-08-05

Citations: 407

DOI: https://doi.org/10.1109/tsp.2010.2062509

Abstract

We formulate and study a decentralized multi-armed bandit (MAB) problem. There are M distributed players competing for N independent arms. Each arm, when played, offers i.i.d. reward according to a distribution with an unknown parameter. At each time, each player chooses one arm to play without exchanging observations or any information with other players. Players choosing the same arm collide, and, depending on the collision model, either no one receives reward or the colliding players share the reward in an arbitrary way. We show that the minimum system regret of the decentralized MAB grows with time at the same logarithmic order as in the centralized counterpart where players act collectively as a single entity by exchanging observations and making decisions jointly. A decentralized policy is constructed to achieve this optimal order while ensuring fairness among players and without assuming any pre-agreement or information exchange among players. Based on a Time Division Fair Sharing (TDFS) of the M best arms, the proposed policy is constructed and its order optimality is proven under a general reward model. Furthermore, the basic structure of the TDFS policy can be used with any order-optimal single-player policy to achieve order optimality in the decentralized setting. We also establish a lower bound on the system regret growth rate for a general class of decentralized polices, to which the proposed policy belongs. This problem finds potential applications in cognitive radio networks, multi-channel communication systems, multi-agent systems, web search and advertising, and social networks.

Locations

IEEE Transactions on Signal Processing - View
arXiv (Cornell University) - View - PDF
DataCite API - View

Similar Works

Action	Title	Year	Authors
+ PDF Chat	Decentralized learning for multi-player multi-armed bandits	2012	Dileep Kalathil Naumaan Nayyar Rahul Jain
+	Decentralized Heterogeneous Multi-Player Multi-Armed Bandits with Non-Zero Rewards on Collisions	2019	Akshayaa Magesh Venugopal V. Veeravalli
+ PDF Chat	Decentralized Learning for Multiplayer Multiarmed Bandits	2014	Dileep Kalathil Naumaan Nayyar Rahul Jain
+	Decentralized Restless Bandit with Multiple Players and Unknown Dynamics	2011	Haoyang Liu Keqin Liu Qing Zhao
+ PDF Chat	Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits	2023	Guojun Xiong Jian Li
+ PDF Chat	Decentralized Heterogeneous Multi-Player Multi-Armed Bandits with Non-Zero Rewards on Collisions	2021	Akshayaa Magesh Venugopal V. Veeravalli
+ PDF Chat	Decentralized Online Learning Algorithms for Opportunistic Spectrum Access	2011	Yi Gai B. Krishnamachari
+	Decentralized Online Learning Algorithms for Opportunistic Spectrum Access	2011	Yi Gai Bhaskar Krishnamachari
+	Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits	2022	Guojun Xiong Jian Li
+	Distributed Bandits with Heterogeneous Agents	2022	Lin Yang Yu-Zhen Janice Chen Mohammad Hajiesmaili John CS Lui Don Towsley
+	Learning to coordinate without communication in multi-user multi-armed bandit problems.	2015	Orly Avner Shie Mannor
+	Multi-Player Bandits Models Revisited	2017	Lilian Besson Emilie Kaufmann
+ PDF Chat	Distributed Bandits with Heterogeneous Agents	2022	Lin Yang Yu-Zhen Janice Chen Mohammad H. Hajiemaili John C. S. Lui Don Towsley
+	Multi-Player Multi-Armed Bandits with Finite Shareable Resources Arms: Learning Algorithms & Applications	2022	Xuchuang Wang Hong Xie John C. S. Lui
+	Optimal Fair Multi-Agent Bandits	2023	Amir Leshem
+	Distributed Multi-Player Bandits - a Game of Thrones Approach	2018	Ilai Bistritz Amir Leshem
+ PDF Chat	QuACK: A Multipurpose Queuing Algorithm for Cooperative $k$-Armed Bandits	2024	Benjamin Howson Sarah Filippi Ciara Pike-Burke
+	Multi-Player Bandits Revisited	2017	Lilian Besson Emilie Kaufmann
+	Distributed Multi-Player Bandits - a Game of Thrones Approach	2018	Ilai Bistritz Amir Leshem
+	An Optimal Algorithm for Multiplayer Multi-Armed Bandits	2019	Alexandre Proutière Po-An Wang

Works That Cite This (169)

Action	Title	Year	Authors
+ PDF Chat	Federated Recommendation System via Differential Privacy	2020	Tan Li Linqi Song Christina Fragouli
+ PDF Chat	Game of Thrones: Fully Distributed Learning for Multiplayer Bandits	2020	Ilai Bistritz Amir Leshem
+	Learning Algorithms for Minimizing Queue Length Regret	2021	Thomas Stahlbuhk Brooke Shrader Eytan Modiano
+	Decentralized Learning for Channel Allocation in IoT Networks Over Unlicensed Bandwidth as a Contextual Multi-Player Multi-Armed Bandit Game	2021	Wenbo Wang Amir Leshem Dusit Niyato Zhu Han
+ PDF Chat	Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems	2013	Sattar Vakili Keqin Liu Qing Zhao
+	Resource Allocation in NOMA-Based Self-Organizing Networks Using Stochastic Multi-Armed Bandits	2021	Marie-Josépha Youssef Venugopal V. Veeravalli Joumana Farah Charbel Abdel Nour Catherine Douillard
+ PDF Chat	Collaborative Learning in Kernel-Based Bandits for Distributed Users	2023	Sudeep Salgia Sattar Vakili Qing Zhao
+ PDF Chat	Online learning for combinatorial network optimization with restless Markovian rewards	2012	Yi Gai Bhaskar Krishnamachari Mingyan Liu
+	Learning in A Changing World: Restless Multi-Armed Bandit with Unknown Dynamics	2010	Haoyang Liu Keqin Liu Qing Zhao
+ PDF Chat	Multi-User Communication Networks: A Coordinated Multi-Armed Bandit Approach	2019	Orly Avner Shie Mannor

Works Cited by This (2)

Action	Title	Year	Authors
+	The Robustness of Lognormal-Based Estimators of Abundance	1990	R. A. Myers Pierre Pepin
+ PDF Chat	Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret	2011	Animashree Anandkumar Nithin Michael Ao Tang Ananthram Swami