Ask AI a math question

Related Paper

Dual-modality Seq2Seq Network for Audio-visual Event Localization

Audio-visual event localization requires one to identify the event which is both visible and audible in a video (either at a frame or video level). To address this task, we propose a deep neural network named Audio-Visual sequence-to-sequence dual network (AVSDN). By jointly taking both audio and visual features at …

Ask a Question