Dynamic Learning Rate for Deep Reinforcement Learning: A Bandit Approach
Dynamic Learning Rate for Deep Reinforcement Learning: A Bandit Approach
In Deep Reinforcement Learning models trained using gradient-based techniques, the choice of optimizer and its learning rate are crucial to achieving good performance: higher learning rates can prevent the model from learning effectively, while lower ones might slow convergence. Additionally, due to the non-stationarity of the objective function, the best-performing …