Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis
Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis
We consider the off-policy evaluation problem in Markov decision processes with function approximation. We propose a generalization of the recently introduced emphatic temporal differences (ETD) algorithm, which encompasses the original ETD(λ), as well as several other off-policy evaluation algorithms as special cases. We call this framework ETD(λ, β), where our …