Comparing Few to Rank Many: Active Human Preference Learning using
Randomized Frank-Wolfe
Comparing Few to Rank Many: Active Human Preference Learning using
Randomized Frank-Wolfe
We study learning of human preferences from a limited comparison feedback. This task is ubiquitous in machine learning. Its applications such as reinforcement learning from human feedback, have been transformational. We formulate this problem as learning a Plackett-Luce model over a universe of $N$ choices from $K$-way comparison feedback, where …