AlignIQL: Policy Alignment in Implicit Q-Learning through Constrained
Optimization
AlignIQL: Policy Alignment in Implicit Q-Learning through Constrained
Optimization
Implicit Q-learning (IQL) serves as a strong baseline for offline RL, which learns the value function using only dataset actions through quantile regression. However, it is unclear how to recover the implicit policy from the learned implicit Q-function and why IQL can utilize weighted regression for policy extraction. IDQL reinterprets …