Visual Saliency Transformer

Type: Article

Publication Date: 2021-10-01

Citations: 319

DOI: https://doi.org/10.1109/iccv48922.2021.00468

Abstract

Existing state-of-the-art saliency detection methods heavily rely on CNN-based architectures. Alternatively, we rethink this task from a convolution-free sequence-to-sequence perspective and predict saliency by modeling long-range dependencies, which can not be achieved by convolution. Specifically, we develop a novel unified model based on a pure transformer, namely, Visual Saliency Transformer (VST), for both RGB and RGB-D salient object detection (SOD). It takes image patches as inputs and leverages the transformer to propagate global contexts among image patches. Unlike conventional architectures used in Vision Transformer (ViT), we leverage multi-level token fusion and propose a new token upsampling method under the transformer framework to get high-resolution detection results. We also develop a token-based multi-task decoder to simultaneously perform saliency and boundary detection by introducing task-related tokens and a novel patch-task-attention mechanism. Experimental results show that our model outperforms existing methods on both RGB and RGB-D SOD benchmark datasets. Most importantly, our whole framework not only provides a new perspective for the SOD field but also shows a new paradigm for transformer-based dense prediction models. Code is available at https://github.com/nnizhang/VST.

Locations

  • arXiv (Cornell University) - View - PDF
  • 2021 IEEE/CVF International Conference on Computer Vision (ICCV) - View

Similar Works

Action Title Year Authors
+ Visual Saliency Transformer 2021 Nian Liu
Ni Zhang
Kaiyuan Wan
Ling Shao
Junwei Han
+ Visual Saliency Transformer 2021 Nian Liu
Ni Zhang
Kaiyuan Wan
Junwei Han
Ling Shao
+ VST++: Efficient and Stronger Visual Saliency Transformer 2023 Nian Liu
Ziyang Luo
Ni Zhang
Junwei Han
+ PDF Chat Texture-Guided Saliency Distilling for Unsupervised Salient Object Detection 2023 Huajun Zhou
Bo Qiao
Lingxiao Yang
Jianhuang Lai
Xiaohua Xie
+ PDF Chat DISC: Deep Image Saliency Computing via Progressive Representation Learning 2016 Tianshui Chen
Liang Lin
Lingbo Liu
Xiaonan Luo
Xuelong Li
+ PDF Chat Recurrent Attentional Networks for Saliency Detection 2016 Jason Kuen
Zhenhua Wang
Gang Wang
+ Recurrent Attentional Networks for Saliency Detection 2016 Jason Kuen
Zhenhua Wang
Gang Wang
+ PDF Chat SODAWideNet++: Combining Attention and Convolutions for Salient Object Detection 2024 Rohit Venkata Sai Dulam
Chandra Kambhamettu
+ Texture-guided Saliency Distilling for Unsupervised Salient Object Detection 2022 Huajun Zhou
Bo Qiao
Lingxiao Yang
Jianhuang Lai
Xiaohua Xie
+ Contextual encoder–decoder network for visual saliency prediction 2020 Alexander Kröner
Mario Senden
Kurt Driessens
Rainer Goebel
+ PDF Chat SalFAU-Net: Saliency Fusion Attention U-Net for Salient Object Detection 2024 Kassaw Abraham Mulat
Zhengyong Feng
Tegegne Solomon Eshetie
Ahmed Endris Hasen
+ Pyramid Feature Attention Network for Saliency detection 2019 Ting Zhao
Xiangqian Wu
+ Vision Transformer with Super Token Sampling 2022 Huaibo Huang
Xiaoqiang Zhou
Jie Cao
Ran He
Tieniu Tan
+ PDF Chat Visual saliency based on multiscale deep features 2015 Guanbin Li
Yizhou Yu
+ Rethinking of the Image Salient Object Detection: Object-level Semantic Saliency Re-ranking First, Pixel-wise Saliency Refinement Latter 2020 Zhenyu Wu
Shuai Li
Chenglizhao Chen
Aimin Hao
Hong Qin
+ TranSalNet: Towards perceptually relevant visual saliency prediction 2022 Jianxun Lou
Hanhe Lin
David Marshall
Dietmar Saupe
Hantao Liu
+ PiCANet: Learning Pixel-wise Contextual Attention for Saliency Detection 2017 Nian Liu
Junwei Han
Ming–Hsuan Yang
+ Rethinking Lightweight Salient Object Detection via Network Depth-Width Tradeoff 2023 Jia Li
Shengye Qiao
Zhirui Zhao
C. Xie
Xiaowu Chen
Changqun Xia
+ PDF Chat Unified Unsupervised Salient Object Detection via Knowledge Transfer 2024 Yuan Yao
Wutao Liu
Pan Gao
Qun Dai
Jie Qin
+ PDF Chat Unified Unsupervised Salient Object Detection via Knowledge Transfer 2024 Yuan Yao
Wutao Liu
Pan Gao
Qun Dai
Jie Qin

Works That Cite This (59)

Action Title Year Authors
+ PDF Chat ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object Detection 2024 Junhao Lin
Lei Zhu
Jiaxing Shen
Huazhu Fu
Qing Zhang
Liansheng Wang
+ PDF Chat ICAFusion: Iterative cross-attention guided feature fusion for multispectral object detection 2023 Jifeng Shen
Yifei Chen
Yue Liu
Xin Zuo
Heng Fan
Wankou Yang
+ PDF Chat Revisiting Image Pyramid Structure for High Resolution Salient Object Detection 2023 Taehun Kim
Kunhee Kim
Joonyeong Lee
Dongmin Cha
Jiho Lee
Daijin Kim
+ PDF Chat Audio–visual collaborative representation learning for Dynamic Saliency Prediction 2022 Hailong Ning
Bin Zhao
Zhanxuan Hu
Lang He
Ercheng Pei
+ PDF Chat Saliency-Aware Spatio-Temporal Artifact Detection for Compressed Video Quality Assessment 2023 Liqun Lin
Yang Zheng
Weiling Chen
Chengdong Lan
Tiesong Zhao
+ PDF Chat A Visual Representation-Guided Framework With Global Affinity for Weakly Supervised Salient Object Detection 2023 Binwei Xu
Haoran Liang
Weihua Gong
Ronghua Liang
Peng Chen
+ PDF Chat GroupTransNet: Group transformer network for RGB-D salient object detection 2024 Xian Fang
Mingfeng Jiang
Jinchao Zhu
Xiuli Shao
Hongpeng Wang
+ PDF Chat SwinNet: Swin Transformer Drives Edge-Aware RGB-D and RGB-T Salient Object Detection 2021 Zhengyi Liu
Yacheng Tan
Qian He
Yun Xiao
+ PDF Chat Learning Adaptive Fusion Bank for Multi-Modal Salient Object Detection 2024 Kunpeng Wang
Zhengzheng Tu
Chenglong Li
Cheng Zhang
Bin Luo
+ Co-Salient Object Detection with Semantic-Level Consensus Extraction and Dispersion 2023 Peiran Xu
Yadong Mu