Representational Strengths and Limitations of Transformers

Type: Preprint

Publication Date: 2023-01-01

Citations: 6

DOI: https://doi.org/10.48550/arxiv.2306.02896

Locations

  • arXiv (Cornell University) - View - PDF
  • DataCite API - View

Similar Works

Action Title Year Authors
+ CoAtNet: Marrying Convolution and Attention for All Data Sizes 2021 Zihang Dai
Hanxiao Liu
Quoc V. Le
Mingxing Tan
+ The Quarks of Attention 2022 Pierre Baldi
Roman Vershynin
+ CoAtNet: Marrying Convolution and Attention for All Data Sizes 2021 Zihang Dai
Hanxiao Liu
Quoc V. Le
Mingxing Tan
+ Attention Enables Zero Approximation Error 2022 Zhiying Fang
Yidong Ouyang
Ding‐Xuan Zhou
Guang Cheng
+ A Theoretical Understanding of Shallow Vision Transformers: Learning, Generalization, and Sample Complexity 2023 Hongkang Li
Meng Wang
Sijia Liu
Pin‐Yu Chen
+ The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers 2022 Zonglin Li
Chong You
Srinadh Bhojanapalli
Daliang Li
Ankit Singh Rawat
Sashank J. Reddi
Ke Ye
Felix Chern
Felix Yu
Ruiqi Guo
+ Attention-Only Transformers and Implementing MLPs with Attention Heads 2023 Robert P. Huben
Valerie B. Morris
+ Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention 2020 Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
François Fleuret
+ A Battle of Network Structures: An Empirical Study of CNN, Transformer, and MLP 2021 Yucheng Zhao
Guangting Wang
Chuanxin Tang
Chong Luo
Wenjun Zeng
Zheng-Jun Zha
+ PDF Chat Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions 2023 Kazuki Irie
RĂłbert CsordĂĄs
JĂźrgen Schmidhuber
+ Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions 2023 Kazuki Irie
RĂłbert CsordĂĄs
JĂźrgen Schmidhuber
+ PDF Chat MLP Can Be A Good Transformer Learner 2024 Sihao Lin
Pumeng Lyu
Dongrui Liu
Tao Tang
Xiaodan Liang
Andy Song
Xiaojun Chang
+ MABViT -- Modified Attention Block Enhances Vision Transformers 2023 Mahesh Ramesh
A. Ramkumar
+ Demystify Transformers & Convolutions in Modern Image Deep Networks 2022 Jifeng Dai
Min Shi
Wei‐Yun Wang
Sitong Wu
Linjie Xing
Wenhai Wang
Xizhou Zhu
Lewei Lu
Jie Zhou
Xiaogang Wang
+ PDF Chat Inference, Learning and Attention Mechanisms that Exploit and Preserve Sparsity in CNNs 2020 Timo Häckel
Mikhail Usvyatsov
Silvano Galliani
Jan Dirk Wegner
Konrad Schindler
+ PDF Chat Transformers, parallel computation, and logarithmic depth 2024 Clayton Sanford
Daniel Hsu
Matus Telgarsky
+ Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers 2023 Vukasin Bozic
Danilo Dordevic
Daniele Coppola
Joseph Thommes
+ On the Expressive Power of Deep Learning: A Tensor Analysis 2015 Nadav Cohen
Or Sharir
Amnon Shashua
+ On the Expressive Power of Deep Learning: A Tensor Analysis 2015 Nadav Cohen
Or Sharir
Amnon Shashua
+ PDF Chat Rethinking Spatial Dimensions of Vision Transformers 2021 Byeongho Heo
Sangdoo Yun
Dongyoon Han
Sanghyuk Chun
Junsuk Choe
Seong Joon Oh

Works Cited by This (0)

Action Title Year Authors