Prefer a chat interface with context about you and your work?
Large-Scale Weakly-Supervised Pre-Training for Video Action Recognition
Current fully-supervised video datasets consist of only a few hundred thousand videos and fewer than a thousand domain-specific labels. This hinders the progress towards advanced video architectures. This paper presents an in-depth study of using large volumes of web videos for pre-training video models for the task of action recognition. …