Ask a Question

Prefer a chat interface with context about you and your work?

Seeing and Hearing Egocentric Actions: How Much Can We Learn?

Seeing and Hearing Egocentric Actions: How Much Can We Learn?

Our interaction with the world is an inherently multimodal experience. However, the understanding of human-to-object interactions has historically been addressed focusing on a single modality. In particular, a limited number of works have considered to integrate the visual and audio modalities for this purpose. In this work, we propose a …