+
PDF
Chat
|
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
|
2021
|
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
Baining Guo
|
2
|
+
|
Conditional Positional Encodings for Vision Transformers
|
2021
|
Xiangxiang Chu
Zhi Tian
Bo Zhang
Xinlong Wang
Xiaolin Wei
Huaxia Xia
Chunhua Shen
|
2
|
+
PDF
Chat
|
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding
|
2021
|
Pengchuan Zhang
Xiyang Dai
Jianwei Yang
Bin Xiao
Lu Yuan
Lei Zhang
Jianfeng Gao
|
2
|
+
PDF
Chat
|
Aggregated Residual Transformations for Deep Neural Networks
|
2017
|
Saining Xie
Ross Girshick
Piotr Dollár
Zhuowen Tu
Kaiming He
|
2
|
+
|
Axial Attention in Multidimensional Transformers
|
2019
|
Jonathan Ho
Nal Kalchbrenner
Dirk Weissenborn
Tim Salimans
|
2
|
+
PDF
Chat
|
Deep Residual Learning for Image Recognition
|
2016
|
Kaiming He
Xiangyu Zhang
Shaoqing Ren
Jian Sun
|
2
|
+
|
Token Labeling: Training a 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet
|
2021
|
Zihang Jiang
Qibin Hou
Li Yuan
Daquan Zhou
Xiaojie Jin
Anran Wang
Jiashi Feng
|
2
|
+
|
Longformer: The Long-Document Transformer
|
2020
|
Iz Beltagy
Matthew E. Peters
Arman Cohan
|
2
|
+
|
Twins: Revisiting Spatial Attention Design in Vision Transformers.
|
2021
|
Xiangxiang Chu
Zhi Tian
Yuqing Wang
Bo Zhang
Haibing Ren
Xiaolin Wei
Huaxia Xia
Chunhua Shen
|
2
|
+
PDF
Chat
|
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers
|
2021
|
Sixiao Zheng
Jiachen Lu
Hengshuang Zhao
Xiatian Zhu
Zekun Luo
Yabiao Wang
Yanwei Fu
Jianfeng Feng
Tao Xiang
Philip H. S. Torr
|
2
|
+
PDF
Chat
|
Deep Networks with Stochastic Depth
|
2016
|
Gao Huang
Yu Sun
Zhuang Liu
Daniel Sedra
Kilian Q. Weinberger
|
2
|
+
PDF
Chat
|
Unified Perceptual Parsing for Scene Understanding
|
2018
|
Tete Xiao
Yingcheng Liu
Bolei Zhou
Yuning Jiang
Jian Sun
|
2
|
+
|
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
|
2020
|
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
Thomas Unterthiner
Mostafa Dehghani
Matthias Minderer
Georg Heigold
Sylvain Gelly
|
2
|
+
PDF
Chat
|
End-to-End Object Detection with Transformers
|
2020
|
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
|
2
|
+
PDF
Chat
|
MobileNetV2: Inverted Residuals and Linear Bottlenecks
|
2018
|
Mark Sandler
Andrew Howard
Menglong Zhu
Andrey Zhmoginov
Liang-Chieh Chen
|
2
|
+
PDF
Chat
|
Going deeper with convolutions
|
2015
|
Christian Szegedy
Wei Liu
Yangqing Jia
Pierre Sermanet
Scott Reed
Dragomir Anguelov
Dumitru Erhan
Vincent Vanhoucke
Andrew Rabinovich
|
2
|
+
PDF
Chat
|
Densely Connected Convolutional Networks
|
2017
|
Gao Huang
Zhuang Liu
Laurens van der Maaten
Kilian Q. Weinberger
|
2
|
+
PDF
Chat
|
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
|
2021
|
Li Yuan
Yunpeng Chen
Tao Wang
Weihao Yu
Yujun Shi
Zihang Jiang
Francis E. H. Tay
Jiashi Feng
Shuicheng Yan
|
2
|
+
PDF
Chat
|
Panoptic Feature Pyramid Networks
|
2019
|
Alexander Kirillov
Ross Girshick
Kaiming He
Piotr Dollár
|
2
|
+
|
Training Vision Transformers for Image Retrieval
|
2021
|
Alaaeldin El-Nouby
Natalia Neverova
Ivan Laptev
Hervé Jeǵou
|
2
|
+
PDF
Chat
|
Cascade R-CNN: Delving Into High Quality Object Detection
|
2018
|
Zhaowei Cai
Nuno Vasconcelos
|
2
|
+
|
Do We Really Need Explicit Position Encodings for Vision Transformers
|
2021
|
Xiangxiang Chu
Bo Zhang
Zhi Tian
Xiaolin Wei
Huaxia Xia
|
2
|
+
PDF
Chat
|
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
|
2021
|
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lü
Ping Luo
Ling Shao
|
2
|
+
PDF
Chat
|
Efficient Content-Based Sparse Attention with Routing Transformers
|
2021
|
Aurko Roy
Mohammad Saffar
Ashish Vaswani
David Grangier
|
2
|
+
|
Transformer in Transformer
|
2021
|
Kai Han
An Xiao
Enhua Wu
Jianyuan Guo
Chunjing Xu
Yunhe Wang
|
2
|
+
PDF
Chat
|
Deep High-Resolution Representation Learning for Human Pose Estimation
|
2019
|
Ke Sun
Bin Xiao
Dong Liu
Jingdong Wang
|
2
|
+
PDF
Chat
|
Incorporating Convolution Designs into Visual Transformers
|
2021
|
Kun Yuan
Shaopeng Guo
Ziwei Liu
Aojun Zhou
Fengwei Yu
Wei Wu
|
2
|
+
PDF
Chat
|
High-Fidelity Pluralistic Image Completion with Transformers
|
2021
|
Ziyu Wan
Jingbo Zhang
Dongdong Chen
Jing Liao
|
2
|
+
|
Very Deep Convolutional Networks for Large-Scale Image Recognition
|
2014
|
Karen Simonyan
Andrew Zisserman
|
2
|
+
|
Multiscale Vision Transformers
|
2021
|
Haoqi Fan
Bo Xiong
Karttikeya Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
|
1
|
+
PDF
Chat
|
Training data-efficient image transformers & distillation through attention
|
2021
|
Hugo Touvron
Matthieu Cord
Matthijs Douze
Francisco Massa
Alexandre Sablayrolles
Hervé Jeǵou
|
1
|
+
PDF
Chat
|
Pre-Trained Image Processing Transformer
|
2021
|
Hanting Chen
Yunhe Wang
Tianyu Guo
Chang Xu
Yiping Deng
Zhenhua Liu
Siwei Ma
Chunjing Xu
Chao Xu
Wen Gao
|
1
|
+
PDF
Chat
|
End-to-End Video Instance Segmentation with Transformers
|
2021
|
Yuqing Wang
Zhaoliang Xu
Xinlong Wang
Chunhua Shen
Baoshan Cheng
Hao Shen
Huaxia Xia
|
1
|
+
PDF
Chat
|
Scaling Local Self-Attention for Parameter Efficient Visual Backbones
|
2021
|
Ashish Vaswani
Prajit Ramachandran
Aravind Srinivas
Niki Parmar
Blake A. Hechtman
Jonathon Shlens
|
1
|
+
|
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding
|
2021
|
Pengchuan Zhang
Xiyang Dai
Jianwei Yang
Bin Xiao
Lu Yuan
Lei Zhang
Jianfeng Gao
|
1
|
+
|
Co-Scale Conv-Attentional Image Transformers
|
2021
|
Weijian Xu
Yifan Xu
Tyler A. Chang
Zhuowen Tu
|
1
|
+
|
Twins: Revisiting the Design of Spatial Attention in Vision Transformers
|
2021
|
Xiangxiang Chu
Zhi Tian
Yuqing Wang
Bo Zhang
Haibing Ren
Xiaolin Wei
Huaxia Xia
Chunhua Shen
|
1
|
+
PDF
Chat
|
CvT: Introducing Convolutions to Vision Transformers
|
2021
|
Haiping Wu
Bin Xiao
Noel Codella
Mengchen Liu
Xiyang Dai
Lu Yuan
Lei Zhang
|
1
|
+
PDF
Chat
|
Going deeper with Image Transformers
|
2021
|
Hugo Touvron
Matthieu Cord
Alexandre Sablayrolles
Gabriel Synnaeve
Hervé Jeǵou
|
1
|
+
PDF
Chat
|
Co-Scale Conv-Attentional Image Transformers
|
2021
|
Weijian Xu
Yifan Xu
Tyler A. Chang
Zhuowen Tu
|
1
|
+
PDF
Chat
|
TransReID: Transformer-based Object Re-Identification
|
2021
|
Shuting He
Hao Luo
Pichao Wang
Fan Wang
Hao Li
Wei Jiang
|
1
|
+
PDF
Chat
|
Segmenter: Transformer for Semantic Segmentation
|
2021
|
Robin Strudel
Ricardo García
Ivan Laptev
Cordelia Schmid
|
1
|
+
|
Compressive Transformers for Long-Range Sequence Modelling
|
2019
|
Jack W. Rae
Anna Potapenko
Siddhant M. Jayakumar
Timothy Lillicrap
|
1
|
+
|
Reformer: The Efficient Transformer
|
2020
|
Nikita Kitaev
Łukasz Kaiser
Anselm Levskaya
|
1
|
+
|
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
|
2017
|
Andrew Howard
Menglong Zhu
Bo Chen
Dmitry Kalenichenko
Weijun Wang
Tobias Weyand
Marco Andreetto
Hartwig Adam
|
1
|
+
|
Stand-Alone Self-Attention in Vision Models
|
2019
|
Prajit Ramachandran
Niki Parmar
Ashish Vaswani
Irwan Bello
Anselm Levskaya
Jonathon Shlens
|
1
|
+
|
Rethinking Attention with Performers
|
2020
|
Krzysztof Choromański
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
Tamás Sarlós
Peter Hawkins
Jared Davis
Afroz Mohiuddin
Łukasz Kaiser
|
1
|
+
|
Generating Long Sequences with Sparse Transformers
|
2019
|
Rewon Child
Scott Gray
Alec Radford
Ilya Sutskever
|
1
|
+
|
Attention Is All You Need
|
2017
|
Ashish Vaswani
Noam Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan N. Gomez
Łukasz Kaiser
Illia Polosukhin
|
1
|
+
PDF
Chat
|
Xception: Deep Learning with Depthwise Separable Convolutions
|
2017
|
François Chollet
|
1
|