Ask a Question

Prefer a chat interface with context about you and your work?

Comparing Few to Rank Many: Active Human Preference Learning using Randomized Frank-Wolfe

Comparing Few to Rank Many: Active Human Preference Learning using Randomized Frank-Wolfe

We study learning of human preferences from a limited comparison feedback. This task is ubiquitous in machine learning. Its applications such as reinforcement learning from human feedback, have been transformational. We formulate this problem as learning a Plackett-Luce model over a universe of $N$ choices from $K$-way comparison feedback, where …