Ask a Question

Prefer a chat interface with context about you and your work?

Grounded Human-Object Interaction Hotspots From Video

Grounded Human-Object Interaction Hotspots From Video

Learning how to interact with objects is an important step towards embodied visual intelligence, but existing techniques suffer from heavy supervision or sensing requirements. We propose an approach to learn human-object interaction "hotspots" directly from video. Rather than treat affordances as a manually supervised semantic segmentation task, our approach learns …