Ask a Question

Prefer a chat interface with context about you and your work?

Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?

Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?

Most existing text-video retrieval methods focus on cross-modal matching between the visual content of videos and textual query sentences. However, in real-world scenarios, online videos are often accompanied by relevant text information such as titles, tags, and even subtitles, which can be utilized to match textual queries. This insight has …