Ask a Question

Prefer a chat interface with context about you and your work?

Learning to Optimize via Posterior Sampling

Learning to Optimize via Posterior Sampling

This paper considers the use of a simple posterior sampling algorithm to balance between exploration and exploitation when learning to optimize actions such as in multiarmed bandit problems. The algorithm, also known as Thompson Sampling and as probability matching, offers significant advantages over the popular upper confidence bound (UCB) approach, …