Ask a Question

Prefer a chat interface with context about you and your work?

Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval

Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval

Our objective in this work is video-text retrieval – in particular a joint embedding that enables efficient text-to-video retrieval. The challenges in this area include the design of the visual architecture and the nature of the training data, in that the available large scale video-text training datasets, such as HowTo100M, …