Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy
Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy
While maximizing expected return is the goal in most reinforcement learning approaches, risk-sensitive objectives such as conditional value at risk (CVaR) are more suitable for many high-stakes applications. However, relatively little is known about how to explore to quickly learn policies with good CVaR. In this paper, we present the …