MDPGT: Momentum-Based Decentralized Policy Gradient Tracking
MDPGT: Momentum-Based Decentralized Policy Gradient Tracking
We propose a novel policy gradient method for multi-agent reinforcement learning, which leverages two different variance-reduction techniques and does not require large batches over iterations. Specifically, we propose a momentum-based decentralized policy gradient tracking (MDPGT) where a new momentum-based variance reduction technique is used to approximate the local policy gradient …