Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
Most existing text-video retrieval methods focus on cross-modal matching between the visual content of videos and textual query sentences. However, in real-world scenarios, online videos are often accompanied by relevant text information such as titles, tags, and even subtitles, which can be utilized to match textual queries. This insight has …