Ask a Question

Prefer a chat interface with context about you and your work?

TD-regularized actor-critic methods

TD-regularized actor-critic methods

Actor-critic methods can achieve incredible performance on difficult reinforcement learning problems, but they are also prone to instability. This is partly due to the interaction between the actor and critic during learning, e.g., an inaccurate step taken by one of them might adversely affect the other and destabilize the learning. …