CACE-Net: Co-guidance Attention and Contrastive Enhancement for
Effective Audio-Visual Event Localization
CACE-Net: Co-guidance Attention and Contrastive Enhancement for
Effective Audio-Visual Event Localization
The audio-visual event localization task requires identifying concurrent visual and auditory events from unconstrained videos within a network model, locating them, and classifying their category. The efficient extraction and integration of audio and visual modal information have always been challenging in this field. In this paper, we introduce CACE-Net, which …