Prefer a chat interface with context about you and your work?
Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos