Ask a Question

Prefer a chat interface with context about you and your work?

CACE-Net: Co-guidance Attention and Contrastive Enhancement for Effective Audio-Visual Event Localization

CACE-Net: Co-guidance Attention and Contrastive Enhancement for Effective Audio-Visual Event Localization

The audio-visual event localization task requires identifying concurrent visual and auditory events from unconstrained videos within a network model, locating them, and classifying their category. The efficient extraction and integration of audio and visual modal information have always been challenging in this field. In this paper, we introduce CACE-Net, which …