Simple Recurrent Units for Highly Parallelizable Recurrence

Type: Article

Publication Date: 2018-01-01

Citations: 236

DOI: https://doi.org/10.18653/v1/d18-1477

Abstract

Common recurrent neural architectures scale poorly due to the intrinsic difficulty in parallelizing their state computations. In this work, we propose the Simple Recurrent Unit (SRU), a light recurrent unit that balances model capacity and scalability. SRU is designed to provide expressive recurrence, enable highly parallelized implementation, and comes with careful initialization to facilitate training of deep models. We demonstrate the effectiveness of SRU on multiple NLP tasks. SRU achieves 5—9x speed-up over cuDNN-optimized LSTM on classification and question answering datasets, and delivers stronger results than LSTM and convolutional models. We also obtain an average of 0.7 BLEU improvement over the Transformer model (Vaswani et al., 2017) on translation by incorporating SRU into the architecture.

Locations

  • arXiv (Cornell University) - View - PDF
  • Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing - View - PDF

Similar Works

Action Title Year Authors
+ Universal Transformers 2018 Mostafa Dehghani
Stephan Gouws
Oriol Vinyals
Jakob Uszkoreit
Łukasz Kaiser
+ Deep Neural Machine Translation with Weakly-Recurrent Units 2018 Mattia Antonino Di Gangi
Marcello Federico
+ Deep Neural Machine Translation with Weakly-Recurrent Units 2018 Mattia Antonino Di Gangi
Marcello Federico
+ Modeling Recurrence for Transformer 2019 Jie Hao
Xing Wang
Baosong Yang
Longyue Wang
Jinfeng Zhang
Zhaopeng Tu
+ Parallelizing Legendre Memory Unit Training 2021 Narsimha Chilkuri
Chris Eliasmith
+ Fast-Slow Recurrent Neural Networks 2017 Asier Mujika
Florian Meier
Angelika Steger
+ PDF Chat When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute 2021 Tao Leí
+ When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute 2021 Tao Lei
+ When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute 2021 Tao Leí
+ Retentive Network: A Successor to Transformer for Large Language Models 2023 Yutao Sun
Li Dong
Shaohan Huang
Shuming Ma
Yuqing Xia
Jilong Xue
Jianyong Wang
Furu Wei
+ SHAQ: Single Headed Attention with Quasi-Recurrence 2021 Nashwin Bharwani
Warren Kushner
Sangeet Dandona
Ben Z. Schreiber
+ Parallelizing Linear Recurrent Neural Nets Over Sequence Length. 2018 Éric Martin
Chris Cundy
+ PDF Chat RecurrentGemma: Moving Past Transformers for Efficient Open Language Models 2024 Aleksandar Botev
Soham De
Samuel Smith
Anushan Fernando
George-Cristian Muraru
Ruba Haroun
Leonard Berrada
Razvan Pascanu
Pier Giuseppe Sessa
Robert Dadashi
+ Parallelizing Linear Recurrent Neural Nets Over Sequence Length 2017 Éric Martin
Chris Cundy
+ PDF Chat Recurrent Memory Networks for Language Modeling 2016 Ke Tran
Arianna Bisazza
Christof Monz
+ Recurrent multiple shared layers in Depth for Neural Machine Translation 2021 Guoliang Li
Yiyang Li
+ PDF Chat Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation 2016 Jie Zhou
Ying Cao
Xuguang Wang
Peng Li
Wei Xu
+ Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation 2016 Jie Zhou
Ying Cao
Xuguang Wang
Peng Li
Wei Xu
+ Depth-Gated Recurrent Neural Networks 2015 Kaisheng Yao
Trevor Cohn
Katerina Vylomova
Kevin Duh
Chris Dyer
+ PDF Chat Scaling recurrent neural network language models 2015 Will Williams
Niranjani Prasad
David Mrva
Tom Ash
Tony Robinson