Empirical Policy Optimization for <i>n</i>-Player Markov Games
Empirical Policy Optimization for <i>n</i>-Player Markov Games
In single-agent Markov decision processes, an agent can optimize its policy based on the interaction with the environment. In multiplayer Markov games (MGs), however, the interaction is nonstationary due to the behaviors of other players, so the agent has no fixed optimization objective. The challenge becomes finding equilibrium policies for …