Seeing and Hearing Egocentric Actions: How Much Can We Learn?
Seeing and Hearing Egocentric Actions: How Much Can We Learn?
Our interaction with the world is an inherently multimodal experience. However, the understanding of human-to-object interactions has historically been addressed focusing on a single modality. In particular, a limited number of works have considered to integrate the visual and audio modalities for this purpose. In this work, we propose a …