Visual Saliency Transformer

Nian Liu, Ni Zhang, Kaiyuan Wan, Ling Shao, Junwei Han

Type: Article

Publication Date: 2021-10-01

Citations: 319

DOI: https://doi.org/10.1109/iccv48922.2021.00468

Abstract

Existing state-of-the-art saliency detection methods heavily rely on CNN-based architectures. Alternatively, we rethink this task from a convolution-free sequence-to-sequence perspective and predict saliency by modeling long-range dependencies, which can not be achieved by convolution. Specifically, we develop a novel unified model based on a pure transformer, namely, Visual Saliency Transformer (VST), for both RGB and RGB-D salient object detection (SOD). It takes image patches as inputs and leverages the transformer to propagate global contexts among image patches. Unlike conventional architectures used in Vision Transformer (ViT), we leverage multi-level token fusion and propose a new token upsampling method under the transformer framework to get high-resolution detection results. We also develop a token-based multi-task decoder to simultaneously perform saliency and boundary detection by introducing task-related tokens and a novel patch-task-attention mechanism. Experimental results show that our model outperforms existing methods on both RGB and RGB-D SOD benchmark datasets. Most importantly, our whole framework not only provides a new perspective for the SOD field but also shows a new paradigm for transformer-based dense prediction models. Code is available at https://github.com/nnizhang/VST.

Locations

arXiv (Cornell University) - View - PDF
2021 IEEE/CVF International Conference on Computer Vision (ICCV) - View

Similar Works

Action	Title	Year	Authors
+	Visual Saliency Transformer	2021	Nian Liu Ni Zhang Kaiyuan Wan Ling Shao Junwei Han
+	Visual Saliency Transformer	2021	Nian Liu Ni Zhang Kaiyuan Wan Junwei Han Ling Shao
+	VST++: Efficient and Stronger Visual Saliency Transformer	2023	Nian Liu Ziyang Luo Ni Zhang Junwei Han
+ PDF Chat	Texture-Guided Saliency Distilling for Unsupervised Salient Object Detection	2023	Huajun Zhou Bo Qiao Lingxiao Yang Jianhuang Lai Xiaohua Xie
+ PDF Chat	DISC: Deep Image Saliency Computing via Progressive Representation Learning	2016	Tianshui Chen Liang Lin Lingbo Liu Xiaonan Luo Xuelong Li
+ PDF Chat	Recurrent Attentional Networks for Saliency Detection	2016	Jason Kuen Zhenhua Wang Gang Wang
+	Recurrent Attentional Networks for Saliency Detection	2016	Jason Kuen Zhenhua Wang Gang Wang
+ PDF Chat	SODAWideNet++: Combining Attention and Convolutions for Salient Object Detection	2024	Rohit Venkata Sai Dulam Chandra Kambhamettu
+	Texture-guided Saliency Distilling for Unsupervised Salient Object Detection	2022	Huajun Zhou Bo Qiao Lingxiao Yang Jianhuang Lai Xiaohua Xie
+	Contextual encoder–decoder network for visual saliency prediction	2020	Alexander Kröner Mario Senden Kurt Driessens Rainer Goebel
+ PDF Chat	SalFAU-Net: Saliency Fusion Attention U-Net for Salient Object Detection	2024	Kassaw Abraham Mulat Zhengyong Feng Tegegne Solomon Eshetie Ahmed Endris Hasen
+	Pyramid Feature Attention Network for Saliency detection	2019	Ting Zhao Xiangqian Wu
+	Vision Transformer with Super Token Sampling	2022	Huaibo Huang Xiaoqiang Zhou Jie Cao Ran He Tieniu Tan
+ PDF Chat	Visual saliency based on multiscale deep features	2015	Guanbin Li Yizhou Yu
+	Rethinking of the Image Salient Object Detection: Object-level Semantic Saliency Re-ranking First, Pixel-wise Saliency Refinement Latter	2020	Zhenyu Wu Shuai Li Chenglizhao Chen Aimin Hao Hong Qin
+	TranSalNet: Towards perceptually relevant visual saliency prediction	2022	Jianxun Lou Hanhe Lin David Marshall Dietmar Saupe Hantao Liu
+	PiCANet: Learning Pixel-wise Contextual Attention for Saliency Detection	2017	Nian Liu Junwei Han Ming–Hsuan Yang
+	Rethinking Lightweight Salient Object Detection via Network Depth-Width Tradeoff	2023	Jia Li Shengye Qiao Zhirui Zhao C. Xie Xiaowu Chen Changqun Xia
+ PDF Chat	Unified Unsupervised Salient Object Detection via Knowledge Transfer	2024	Yuan Yao Wutao Liu Pan Gao Qun Dai Jie Qin
+ PDF Chat	Unified Unsupervised Salient Object Detection via Knowledge Transfer	2024	Yuan Yao Wutao Liu Pan Gao Qun Dai Jie Qin

Works That Cite This (59)

Action	Title	Year	Authors
+ PDF Chat	ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object Detection	2024	Junhao Lin Lei Zhu Jiaxing Shen Huazhu Fu Qing Zhang Liansheng Wang
+ PDF Chat	ICAFusion: Iterative cross-attention guided feature fusion for multispectral object detection	2023	Jifeng Shen Yifei Chen Yue Liu Xin Zuo Heng Fan Wankou Yang
+ PDF Chat	Revisiting Image Pyramid Structure for High Resolution Salient Object Detection	2023	Taehun Kim Kunhee Kim Joonyeong Lee Dongmin Cha Jiho Lee Daijin Kim
+ PDF Chat	Audio–visual collaborative representation learning for Dynamic Saliency Prediction	2022	Hailong Ning Bin Zhao Zhanxuan Hu Lang He Ercheng Pei
+ PDF Chat	Saliency-Aware Spatio-Temporal Artifact Detection for Compressed Video Quality Assessment	2023	Liqun Lin Yang Zheng Weiling Chen Chengdong Lan Tiesong Zhao
+ PDF Chat	A Visual Representation-Guided Framework With Global Affinity for Weakly Supervised Salient Object Detection	2023	Binwei Xu Haoran Liang Weihua Gong Ronghua Liang Peng Chen
+ PDF Chat	GroupTransNet: Group transformer network for RGB-D salient object detection	2024	Xian Fang Mingfeng Jiang Jinchao Zhu Xiuli Shao Hongpeng Wang
+ PDF Chat	SwinNet: Swin Transformer Drives Edge-Aware RGB-D and RGB-T Salient Object Detection	2021	Zhengyi Liu Yacheng Tan Qian He Yun Xiao
+ PDF Chat	Learning Adaptive Fusion Bank for Multi-Modal Salient Object Detection	2024	Kunpeng Wang Zhengzheng Tu Chenglong Li Cheng Zhang Bin Luo
+	Co-Salient Object Detection with Semantic-Level Consensus Extraction and Dispersion	2023	Peiran Xu Yadong Mu

Works Cited by This (44)

Action	Title	Year	Authors
+	Deformable DETR: Deformable Transformers for End-to-End Object Detection	2020	Xizhou Zhu Weijie Su Lewei Lu Bin Li Xiaogang Wang Jifeng Dai
+	Very Deep Convolutional Networks for Large-Scale Image Recognition	2014	Karen Simonyan Andrew Zisserman
+ PDF Chat	Learning Deconvolution Network for Semantic Segmentation	2015	Hyeonwoo Noh Seunghoon Hong Bohyung Han
+ PDF Chat	Visual saliency based on multiscale deep features	2015	Guanbin Li Yizhou Yu
+ PDF Chat	Hierarchical Saliency Detection	2013	Qiong Yan Li Xu Jianping Shi Jiaya Jia
+ PDF Chat	The Secrets of Salient Object Segmentation	2014	Yin Li Xiaodi Hou Christof Koch James M. Rehg Alan Yuille
+ PDF Chat	Deep Residual Learning for Image Recognition	2016	Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun
+ PDF Chat	Deeply Supervised Salient Object Detection with Short Connections	2018	Qibin Hou Ming‐Ming Cheng Xiaowei Hu Ali Borji Zhuowen Tu Philip H. S. Torr
+ PDF Chat	Salient Object Detection in the Deep Learning Era: An In-Depth Survey	2021	Wenguan Wang Qiuxia Lai Huazhu Fu Jianbing Shen Haibin Ling Ruigang Yang
+ PDF Chat	A Simple Pooling-Based Design for Real-Time Salient Object Detection	2019	Jiangjiang Liu Qibin Hou Ming‐Ming Cheng Jiashi Feng Jianmin Jiang