Ask a Question

Prefer a chat interface with context about you and your work?

Dual-modality Seq2Seq Network for Audio-visual Event Localization

Dual-modality Seq2Seq Network for Audio-visual Event Localization

Audio-visual event localization requires one to identify the event which is both visible and audible in a video (either at a frame or video level). To address this task, we propose a deep neural network named Audio-Visual sequence-to-sequence dual network (AVSDN). By jointly taking both audio and visual features at …