Ask a Question

Prefer a chat interface with context about you and your work?

Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy

Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy

While maximizing expected return is the goal in most reinforcement learning approaches, risk-sensitive objectives such as conditional value at risk (CVaR) are more suitable for many high-stakes applications. However, relatively little is known about how to explore to quickly learn policies with good CVaR. In this paper, we present the …