Ask a Question

Prefer a chat interface with context about you and your work?

Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion

Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion

One significant factor we expect the video representation learning to capture, especially in contrast with the image representation learning, is the object motion. However, we found that in the current mainstream video datasets, some action categories are highly related with the scene where the action happens, making the model tend …