Ask a Question

Prefer a chat interface with context about you and your work?

Learning With Options That Terminate Off-Policy

Learning With Options That Terminate Off-Policy

A temporally abstract action, or an option, is specified by a policy and a termination condition: the policy guides the option behavior, and the termination condition roughly determines its length. Generally, learning with longer options (like learning with multi-step returns) is known to be more efficient. However, if the option …