Ask a Question

Prefer a chat interface with context about you and your work?

DropMAE: Masked Autoencoders with Spatial-Attention Dropout for Tracking Tasks

DropMAE: Masked Autoencoders with Spatial-Attention Dropout for Tracking Tasks

In this paper, we study masked autoencoder (MAE) pretraining on videos for matching-based downstream tasks, including visual object tracking (VOT) and video object segmentation (VOS). A simple extension of MAE is to randomly mask out frame patches in videos and reconstruct the frame pixels. However, we find that this simple …