Look Before You Match: Instance Understanding Matters in Video Object Segmentation

Type: Article

Publication Date: 2023-06-01

Citations: 20

DOI: https://doi.org/10.1109/cvpr52729.2023.00225

Abstract

Exploring dense matching between the current frame and past frames for long-range context modeling, memory-based methods have demonstrated impressive results in video object segmentation (VOS) recently. Nevertheless, due to the lack of instance understanding ability, the above approaches are oftentimes brittle to large appearance variations or viewpoint changes resulted from the movement of objects and cameras. In this paper, we argue that instance understanding matters in VOS, and integrating it with memory-based matching can enjoy the synergy, which is intuitively sensible from the definition of VOS task, i.e., identifying and segmenting object instances within the video. Towards this goal, we present a two-branch network for VOS, where the query-based instance segmentation (IS) branch delves into the instance details of the current frame and the VOS branch performs spatial-temporal matching with the memory bank. We employ the well-learned object queries from IS branch to inject instance-specific information into the query key, with which the instance-augmented matching is further performed. In addition, we introduce a multi-path fusion block to effectively combine the memory readout with multi-scale features from the instance segmentation decoder, which incorporates high-resolution instance-aware features to produce final segmentation results. Our method achieves state-of-the-art performance on DAVIS 2016/2017 val (92.6% and 87.1%), DAVIS 2017 test-dev (82.8%), and YouTube-VOS 2018/2019 val (86.3% and 86.3%), outperforming alternative methods by clear margins.

Locations

  • arXiv (Cornell University) - View - PDF
  • 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) - View

Similar Works

Action Title Year Authors
+ Look Before You Match: Instance Understanding Matters in Video Object Segmentation 2022 Junke Wang
Dongdong Chen
Zuxuan Wu
Chong Luo
Chuanxin Tang
Xiyang Dai
Yucheng Zhao
Yujia Xie
Lu Yuan
Yu–Gang Jiang
+ Two-Stream Networks for Object Segmentation in Videos 2022 Hannan Lu
Zhi Tian
Lirong Yang
Haibing Ren
Wangmeng Zuo
+ End-to-End Video Instance Segmentation with Transformers 2020 Yuqing Wang
Zhaoliang Xu
Xinlong Wang
Chunhua Shen
Baoshan Cheng
Hao Shen
Huaxia Xia
+ PDF Chat End-to-End Video Instance Segmentation with Transformers 2021 Yuqing Wang
Zhaoliang Xu
Xinlong Wang
Chunhua Shen
Baoshan Cheng
Hao Shen
Huaxia Xia
+ PDF Chat Context-Aware Video Instance Segmentation 2024 Seunghun Lee
Jiwan Seo
Kiljoon Han
Minwoo Choi
Sunghoon Im
+ SeqFormer: Sequential Transformer for Video Instance Segmentation 2021 Junfeng Wu
Yi Jiang
Song Bai
Wenqing Zhang
Xiang Bai
+ PDF Chat Hybrid Instance-aware Temporal Fusion for Online Video Instance Segmentation 2021 Xiang Li
Jinglu Wang
Xiao Li
Yan LĂź
+ Hybrid Instance-aware Temporal Fusion for Online Video Instance Segmentation 2021 Xiang Li
Jinglu Wang
Xiao Li
Yan LĂź
+ PDF Chat Hybrid Instance-Aware Temporal Fusion for Online Video Instance Segmentation 2022 Xiang Li
Jinglu Wang
Xiaoli Li
Yan Lu
+ GRAtt-VIS: Gated Residual Attention for Auto Rectifying Video Instance Segmentation 2023 Tanveer Hannan
Rajat Koner
Maximilian Bernhard
Suprosanna Shit
Bjoern Menze
Volker Tresp
Matthias Schubert
Thomas Seidl
+ MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training 2022 De-An Huang
Zhiding Yu
Anima Anandkumar
+ Temporally Efficient Vision Transformer for Video Instance Segmentation 2022 Shusheng Yang
Xinggang Wang
Yu Li
Yuxin Fang
Jiemin Fang
Wenyu Liu
Xun Zhao
Ying Shan
+ Video Instance Segmentation by Instance Flow Assembly 2021 Xiang Li
Jinglu Wang
Xiao Li
Yan Lu
+ PDF Chat Temporally Efficient Vision Transformer for Video Instance Segmentation 2022 Shusheng Yang
Xinggang Wang
Yu Li
Yuxin Fang
Jiemin Fang
Wenyu Liu
Xun Zhao
Ying Shan
+ Occluded Video Instance Segmentation 2021 Jiyang Qi
Yan Gao
Xiaoyu Liu
Yao Hu
Xinggang Wang
Xiang Bai
Philip H. S. Torr
Serge Belongie
Alan Yuille
Song Bai
+ Offline-to-Online Knowledge Distillation for Video Instance Segmentation 2023 Ho‐Jin Kim
Seunghun Lee
Sunghoon Im
+ Consistent Video Instance Segmentation with Inter-Frame Recurrent Attention 2022 Quanzeng You
Jiang Wang
Peng Chu
Andre Abrantes
Zicheng Liu
+ PDF Chat Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation 2021 Minghan Li
Shuai Li
Lida Li
Lei Zhang
+ PDF Chat Video Object Segmentation with Dynamic Query Modulation 2024 Hantao Zhou
Runze Hu
Xiu Li
+ Occluded Video Instance Segmentation: A Benchmark 2021 Jiyang Qi
Yan Gao
Yao Hu
Xinggang Wang
Xiaoyu Liu
Xiang Bai
Serge Belongie
Alan Yuille
Philip H. S. Torr
Song Bai