The Monte Carlo Transformer: a stochastic self-attention model for sequence prediction

Alice O. Martın, Charles Ollion, Florian Strub, Sylvain Le Corff, Olivier Pietquin

Type: Preprint

Publication Date: 2020-01-01

Citations: 2

DOI: https://doi.org/10.48550/arxiv.2007.08620

View Publication

Locations

arXiv (Cornell University) - View - PDF
HAL (Le Centre pour la Communication Scientifique Directe) - View - PDF
DataCite API - View

Similar Works

Action	Title	Year	Authors
+ PDF Chat	The Monte Carlo Transformer: a stochastic self-attention model for sequence prediction	2020	Alice O. Martın Charles Ollion Florian Strub Sylvain Le Corff Olivier Pietquin
+ PDF Chat	Transformers on Markov Data: Constant Depth Suffices	2024	Nived Rajaraman Marco Bondaschi Kannan Ramchandran Michael Gastpar Ashok Vardhan Makkuva
+	You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling	2021	Zhanpeng Zeng Yunyang Xiong Sathya N. Ravi Shailesh Acharya Glenn Fung Vikas Singh
+ PDF Chat	You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling	2021	Zhanpeng Zeng Yunyang Xiong Sathya N. Ravi Shailesh Acharya Glenn Fung Vikas Singh
+ PDF Chat	$k$NN Attention Demystified: A Theoretical Exploration for Scalable Transformers	2024	Themistoklis Haris
+ PDF Chat	Upper and lower memory capacity bounds of transformers for next-token prediction	2024	Liam Madden Curtis Fox Christos Thrampoulidis
+ PDF Chat	From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers	2024	M. Emrullah Ildiz Yixiao Huang Yingcong Li Ankit Singh Rawat Samet Oymak
+ PDF Chat	Efficient Content-Based Sparse Attention with Routing Transformers	2021	Aurko Roy Mohammad Saffar Ashish Vaswani David Grangier
+	Staircase Attention for Recurrent Processing of Sequences	2021	Da Young Ju Stephen Roller Sainbayar Sukhbaatar Jason Weston
+ PDF Chat	Staircase Attention for Recurrent Processing of Sequences	2021	Da Young Ju Stephen Roller Sainbayar Sukhbaatar Jason Weston
+	Transformers meet Stochastic Block Models: Attention with Data-Adaptive Sparsity and Cost	2022	Sungjun Cho Seonwoo Min Jinwoo Kim Moontae Lee Honglak Lee Seunghoon Hong
+	Efficient Content-Based Sparse Attention with Routing Transformers	2020	Aurko Roy Mohammad Saffar Ashish Vaswani David Grangier
+	Efficient Content-Based Sparse Attention with Routing Transformers	2020	Aurko Roy Mohammad Saffar Ashish Vaswani David Grangier
+ PDF Chat	Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains	2024	Ashok Vardhan Makkuva Marco Bondaschi Adway Girish Alliot Nagle Martin Jaggi Hyeji Kim Michael Gastpar
+ PDF Chat	Mechanics of Next Token Prediction with Self-Attention	2024	Yingcong Li Yixiao Huang M. Emrullah Ildiz Ankit Singh Rawat Samet Oymak
+ PDF Chat	Attention as an RNN	2024	Leo Feng Frederick Tung Hossein Hajimirsadeghi Mohamed Osama Ahmed Yoshua Bengio Greg Mori
+ PDF Chat	Transformers Simulate MLE for Sequence Generation in Bayesian Networks	2025	Yuan Cao Yihan He Dennis Wu Hongyu Chen Jianqing Fan Han Liu
+ PDF Chat	Towards Understanding the Universality of Transformers for Next-Token Prediction	2024	Michael E. Sander Gabriel Peyré
+ PDF Chat	Cottention: Linear Transformers With Cosine Attention	2024	Gabriel Mongaras Trevor Dohm Eric C. Larson
+ PDF Chat	Star Attention: Efficient LLM Inference over Long Sequences	2024	Shantanu Acharya Fei Jia Boris Ginsburg

Works That Cite This (2)

Action	Title	Year	Authors
+	Stochastic Latent Transformer: Efficient Modelling of Stochastically Forced Zonal Jets	2023	Ira J. S. Shokar Rich R. Kerswell Peter Haynes
+ PDF Chat	Transformer Uncertainty Estimation with Hierarchical Stochastic Attention	2022	Jiahuan Pei Cheng Wang György Szarvas

Works Cited by This (38)

Action	Title	Year	Authors
+	Efficient particle-based online smoothing in general hidden Markov models: The PaRIS algorithm	2017	Jimmy Olsson Johan Westerborn
+	Variational Recurrent Auto-Encoders	2014	Otto Fabius Joost R. van Amersfoort
+ PDF Chat	Sequential Monte Carlo smoothing with application to parameter estimation in nonlinear state space models	2008	Jimmy Olsson Olivier Cappé Randal Douc Éric Moulines
+	Novel approach to nonlinear/non-Gaussian Bayesian state estimation	1993	Neil Gordon David Salmond A. F. M. Smith
+	Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding	2017	Alex Kendall Vijay Badrinarayanan Roberto Cipolla
+	Neural Machine Translation by Jointly Learning to Align and Translate	2014	Dzmitry Bahdanau Kyunghyun Cho Yoshua Bengio
+ PDF Chat	Kalman Temporal Differences	2010	M. Geist Olivier Pietquin
+	Deep Exploration via Bootstrapped DQN	2016	Ian Osband Charles Blundell Alexander Pritzel Benjamin Van Roy
+	Sequential Neural Models with Stochastic Layers	2016	M. Fraccaro Søren Kaae Sønderby Ulrich Paquet Ole Winther
+	A Structured Self-attentive Sentence Embedding	2017	Zhouhan Lin Minwei Feng Cícero Nogueira dos Santos Mo Yu Bing Xiang Bowen Zhou Yoshua Bengio