Ask a Question

Prefer a chat interface with context about you and your work?

EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition

EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition

We focus on multi-modal fusion for egocentric action recognition, and propose a novel architecture for multimodal temporal-binding, i.e. the combination of modalities within a range of temporal offsets. We train the architecture with three modalities - RGB, Flow and Audio - and combine them with mid-level fusion alongside sparse temporal …