Ask a Question

Prefer a chat interface with context about you and your work?

Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise Rollouts

Model-based Multi-agent Policy Optimization with Adaptive Opponent-wise Rollouts

This paper investigates the model-based methods in multi-agent reinforcement learning (MARL). We specify the dynamics sample complexity and the opponent sample complexity in MARL, and conduct a theoretic analysis of return discrepancy upper bound. To reduce the upper bound with the intention of low sample complexity during the whole learning …