Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency
Learning from Untrimmed Videos: Self-Supervised Video Representation Learning with Hierarchical Consistency
Natural videos provide rich visual contents for selfsupervised learning. Yet most existing approaches for learning spatio-temporal representations rely on manually trimmed videos, leading to limited diversity in visual patterns and limited performance gain. In this work, we aim to learn representations by leveraging more abundant information in untrimmed videos. To …