Ask a Question

Prefer a chat interface with context about you and your work?

Increasing the Action Gap: New Operators for Reinforcement Learning

Increasing the Action Gap: New Operators for Reinforcement Learning

This paper introduces new optimality-preserving operators on Q-functions. We first describe an operator for tabular representations, the consistent Bellman operator, which incorporates a notion of local policy consistency. We show that this local consistency leads to an increase in the action gap at each state; increasing this gap, we argue, …