Ask a Question

Prefer a chat interface with context about you and your work?

Towards Open-Vocabulary Audio-Visual Event Localization

Towards Open-Vocabulary Audio-Visual Event Localization

The Audio-Visual Event Localization (AVEL) task aims to temporally locate and classify video events that are both audible and visible. Most research in this field assumes a closed-set setting, which restricts these models' ability to handle test data containing event categories absent (unseen) during training. Recently, a few studies have …