The Monte Carlo Transformer: a stochastic self-attention model for sequence prediction

Type: Preprint

Publication Date: 2020-01-01

Citations: 2

DOI: https://doi.org/10.48550/arxiv.2007.08620

Locations

  • arXiv (Cornell University) - View - PDF
  • HAL (Le Centre pour la Communication Scientifique Directe) - View - PDF
  • DataCite API - View

Similar Works

Action Title Year Authors
+ PDF Chat The Monte Carlo Transformer: a stochastic self-attention model for sequence prediction 2020 Alice O. Martın
Charles Ollion
Florian Strub
Sylvain Le Corff
Olivier Pietquin
+ PDF Chat Transformers on Markov Data: Constant Depth Suffices 2024 Nived Rajaraman
Marco Bondaschi
Kannan Ramchandran
Michael Gastpar
Ashok Vardhan Makkuva
+ You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling 2021 Zhanpeng Zeng
Yunyang Xiong
Sathya N. Ravi
Shailesh Acharya
Glenn Fung
Vikas Singh
+ PDF Chat You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling 2021 Zhanpeng Zeng
Yunyang Xiong
Sathya N. Ravi
Shailesh Acharya
Glenn Fung
Vikas Singh
+ PDF Chat $k$NN Attention Demystified: A Theoretical Exploration for Scalable Transformers 2024 Themistoklis Haris
+ PDF Chat Upper and lower memory capacity bounds of transformers for next-token prediction 2024 Liam Madden
Curtis Fox
Christos Thrampoulidis
+ PDF Chat From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers 2024 M. Emrullah Ildiz
Yixiao Huang
Yingcong Li
Ankit Singh Rawat
Samet Oymak
+ PDF Chat Efficient Content-Based Sparse Attention with Routing Transformers 2021 Aurko Roy
Mohammad Saffar
Ashish Vaswani
David Grangier
+ Staircase Attention for Recurrent Processing of Sequences 2021 Da Young Ju
Stephen Roller
Sainbayar Sukhbaatar
Jason Weston
+ PDF Chat Staircase Attention for Recurrent Processing of Sequences 2021 Da Young Ju
Stephen Roller
Sainbayar Sukhbaatar
Jason Weston
+ Transformers meet Stochastic Block Models: Attention with Data-Adaptive Sparsity and Cost 2022 Sungjun Cho
Seonwoo Min
Jinwoo Kim
Moontae Lee
Honglak Lee
Seunghoon Hong
+ Efficient Content-Based Sparse Attention with Routing Transformers 2020 Aurko Roy
Mohammad Saffar
Ashish Vaswani
David Grangier
+ Efficient Content-Based Sparse Attention with Routing Transformers 2020 Aurko Roy
Mohammad Saffar
Ashish Vaswani
David Grangier
+ PDF Chat Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains 2024 Ashok Vardhan Makkuva
Marco Bondaschi
Adway Girish
Alliot Nagle
Martin Jaggi
Hyeji Kim
Michael Gastpar
+ PDF Chat Mechanics of Next Token Prediction with Self-Attention 2024 Yingcong Li
Yixiao Huang
M. Emrullah Ildiz
Ankit Singh Rawat
Samet Oymak
+ PDF Chat Attention as an RNN 2024 Leo Feng
Frederick Tung
Hossein Hajimirsadeghi
Mohamed Osama Ahmed
Yoshua Bengio
Greg Mori
+ PDF Chat Transformers Simulate MLE for Sequence Generation in Bayesian Networks 2025 Yuan Cao
Yihan He
Dennis Wu
Hongyu Chen
Jianqing Fan
Han Liu
+ PDF Chat Towards Understanding the Universality of Transformers for Next-Token Prediction 2024 Michael E. Sander
Gabriel Peyré
+ PDF Chat Cottention: Linear Transformers With Cosine Attention 2024 Gabriel Mongaras
Trevor Dohm
Eric C. Larson
+ PDF Chat Star Attention: Efficient LLM Inference over Long Sequences 2024 Shantanu Acharya
Fei Jia
Boris Ginsburg