Ask a Question

Prefer a chat interface with context about you and your work?

AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video Grounding

AutoTVG: A New Vision-language Pre-training Paradigm for Temporal Video Grounding

Temporal Video Grounding (TVG) aims to localize a moment from an untrimmed video given the language description. Since the annotation of TVG is labor-intensive, TVG under limited supervision has accepted attention in recent years. The great success of vision-language pre-training guides TVG to follow the traditional "pre-training + fine-tuning" paradigm, …