Ask a Question

Prefer a chat interface with context about you and your work?

Efficient Video Object Segmentation via Modulated Cross-Attention Memory

Efficient Video Object Segmentation via Modulated Cross-Attention Memory

Recently, transformer-based approaches have shown promising results for semi-supervised video object segmentation. However, these approaches typically struggle on long videos due to increased GPU memory demands, as they frequently expand the memory bank every few frames. We propose a transformer-based approach, named MAVOS, that introduces an optimized and dynamic long-term …