Ask a Question

Prefer a chat interface with context about you and your work?

Bridging Video-text Retrieval with Multiple Choice Questions

Bridging Video-text Retrieval with Multiple Choice Questions

Pretraining a model to learn transferable video-text representation for retrieval has attracted a lot of attention in recent years. Previous dominant works mainly adopt two separate encoders for efficient retrieval, but ignore local associations between videos and texts. Another line of research uses a joint encoder to interact video with …