Ask a Question

Prefer a chat interface with context about you and your work?

AlignIQL: Policy Alignment in Implicit Q-Learning through Constrained Optimization

AlignIQL: Policy Alignment in Implicit Q-Learning through Constrained Optimization

Implicit Q-learning (IQL) serves as a strong baseline for offline RL, which learns the value function using only dataset actions through quantile regression. However, it is unclear how to recover the implicit policy from the learned implicit Q-function and why IQL can utilize weighted regression for policy extraction. IDQL reinterprets …