Grounded Human-Object Interaction Hotspots From Video
Grounded Human-Object Interaction Hotspots From Video
Learning how to interact with objects is an important step towards embodied visual intelligence, but existing techniques suffer from heavy supervision or sensing requirements. We propose an approach to learn human-object interaction "hotspots" directly from video. Rather than treat affordances as a manually supervised semantic segmentation task, our approach learns …