Ask a Question

Prefer a chat interface with context about you and your work?

Large-Scale Weakly-Supervised Pre-Training for Video Action Recognition

Large-Scale Weakly-Supervised Pre-Training for Video Action Recognition

Current fully-supervised video datasets consist of only a few hundred thousand videos and fewer than a thousand domain-specific labels. This hinders the progress towards advanced video architectures. This paper presents an in-depth study of using large volumes of web videos for pre-training video models for the task of action recognition. …