Efficient Video Object Segmentation via Modulated Cross-Attention Memory
Efficient Video Object Segmentation via Modulated Cross-Attention Memory
Recently, transformer-based approaches have shown promising results for semi-supervised video object segmentation. However, these approaches typically struggle on long videos due to increased GPU memory demands, as they frequently expand the memory bank every few frames. We propose a transformer-based approach, named MAVOS, that introduces an optimized and dynamic long-term …