Qualitative Measurements of Policy Discrepancy for Return-Based Deep Q-Network
Qualitative Measurements of Policy Discrepancy for Return-Based Deep Q-Network
The deep Q-network (DQN) and return-based reinforcement learning are two promising algorithms proposed in recent years. DQN brings advances to complex sequential decision problems, while return-based algorithms have advantages in making use of sample trajectories. In this paper, we propose a general framework to combine DQN and most of the …