Mostly Exploration-Free Algorithms for Contextual Bandits
Mostly Exploration-Free Algorithms for Contextual Bandits
The contextual bandit literature has traditionally focused on algorithms that address the exploration–exploitation tradeoff. In particular, greedy algorithms that exploit current estimates without any exploration may be suboptimal in general. However, exploration-free greedy algorithms are desirable in practical settings where exploration may be costly or unethical (e.g., clinical trials). Surprisingly, …