Simple Recurrent Units for Highly Parallelizable Recurrence

Tao Leí, Yu Zhang, Sida I. Wang, Hui Dai, Yoav Artzi

Type: Article

Publication Date: 2018-01-01

Citations: 236

DOI: https://doi.org/10.18653/v1/d18-1477

Abstract

Common recurrent neural architectures scale poorly due to the intrinsic difficulty in parallelizing their state computations. In this work, we propose the Simple Recurrent Unit (SRU), a light recurrent unit that balances model capacity and scalability. SRU is designed to provide expressive recurrence, enable highly parallelized implementation, and comes with careful initialization to facilitate training of deep models. We demonstrate the effectiveness of SRU on multiple NLP tasks. SRU achieves 5—9x speed-up over cuDNN-optimized LSTM on classification and question answering datasets, and delivers stronger results than LSTM and convolutional models. We also obtain an average of 0.7 BLEU improvement over the Transformer model (Vaswani et al., 2017) on translation by incorporating SRU into the architecture.

Locations

arXiv (Cornell University) - View - PDF
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing - View - PDF

Similar Works

Action	Title	Year	Authors
+	Universal Transformers	2018	Mostafa Dehghani Stephan Gouws Oriol Vinyals Jakob Uszkoreit Łukasz Kaiser
+	Deep Neural Machine Translation with Weakly-Recurrent Units	2018	Mattia Antonino Di Gangi Marcello Federico
+	Deep Neural Machine Translation with Weakly-Recurrent Units	2018	Mattia Antonino Di Gangi Marcello Federico
+	Modeling Recurrence for Transformer	2019	Jie Hao Xing Wang Baosong Yang Longyue Wang Jinfeng Zhang Zhaopeng Tu
+	Parallelizing Legendre Memory Unit Training	2021	Narsimha Chilkuri Chris Eliasmith
+	Fast-Slow Recurrent Neural Networks	2017	Asier Mujika Florian Meier Angelika Steger
+ PDF Chat	When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute	2021	Tao Leí
+	When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute	2021	Tao Lei
+	When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute	2021	Tao Leí
+	Retentive Network: A Successor to Transformer for Large Language Models	2023	Yutao Sun Li Dong Shaohan Huang Shuming Ma Yuqing Xia Jilong Xue Jianyong Wang Furu Wei
+	SHAQ: Single Headed Attention with Quasi-Recurrence	2021	Nashwin Bharwani Warren Kushner Sangeet Dandona Ben Z. Schreiber
+	Parallelizing Linear Recurrent Neural Nets Over Sequence Length.	2018	Éric Martin Chris Cundy
+ PDF Chat	RecurrentGemma: Moving Past Transformers for Efficient Open Language Models	2024	Aleksandar Botev Soham De Samuel Smith Anushan Fernando George-Cristian Muraru Ruba Haroun Leonard Berrada Razvan Pascanu Pier Giuseppe Sessa Robert Dadashi
+	Parallelizing Linear Recurrent Neural Nets Over Sequence Length	2017	Éric Martin Chris Cundy
+ PDF Chat	Recurrent Memory Networks for Language Modeling	2016	Ke Tran Arianna Bisazza Christof Monz
+	Recurrent multiple shared layers in Depth for Neural Machine Translation	2021	Guoliang Li Yiyang Li
+ PDF Chat	Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation	2016	Jie Zhou Ying Cao Xuguang Wang Peng Li Wei Xu
+	Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation	2016	Jie Zhou Ying Cao Xuguang Wang Peng Li Wei Xu
+	Depth-Gated Recurrent Neural Networks	2015	Kaisheng Yao Trevor Cohn Katerina Vylomova Kevin Duh Chris Dyer
+ PDF Chat	Scaling recurrent neural network language models	2015	Will Williams Niranjani Prasad David Mrva Tom Ash Tony Robinson

Works That Cite This (83)

Action	Title	Year	Authors
+	Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey	2020	Lei Deng Guoqi Li Song Han Luping Shi Yuan Xie
+	Metric Learning for Dynamic Text Classification	2019	Jeremy Wohlwend Ethan R. Elenberg Sam Altschul Shawn Henry Tao Leí
+	Autoregressive Knowledge Distillation through Imitation Learning	2020	Alexander Lin Jeremy Wohlwend Howard Chen Tao Leí
+	Target-Oriented Deformation of Visual-Semantic Embedding Space	2019	Takashi Matsubara
+	Multi-range Reasoning for Machine Comprehension	2018	Yi Tay Luu Anh Tuan Siu Cheung Hui
+ PDF Chat	MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding	2020	Geondo Park Chihye Han Dae‐Shik Kim Wonjun Yoon
+	Rank and run-time aware compression of NLP Applications	2020	Urmish Thakker Jesse Beu Dibakar Gope Ganesh Dasika Matthew Mattina
+	Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers.	2021	Albert Gu Isys Johnson Karan Goel Khaled Saab Tri Dao Atri Rudra Christopher Ré
+	MIOpen: An Open Source Library For Deep Learning Primitives	2019	Jehandad Khan Paul Fultz Artem Tamazov Daniel Lowell Chao Liu Michael Melesse Murali Nandhimandalam Kamil Nasyrov I. Perminov Tejash Shah
+	RWKV: Reinventing RNNs for the Transformer Era	2023	Bo Peng Eric Alcaide Quentin Anthony Alon Albalak Samuel Arcadinho Stella Biderman Huanqi Cao Xin Cheng Michael Chung Leon Derczynski

Works Cited by This (60)

Action	Title	Year	Authors
+	Training Very Deep Networks	2015	Rupesh K. Srivastava Klaus Greff Jürgen Schmidhuber
+	Recurrent Neural Network Regularization	2014	Wojciech Zaremba Ilya Sutskever Oriol Vinyals
+ PDF Chat	Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification	2015	Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun
+ PDF Chat	LSTM: A Search Space Odyssey	2016	Klaus Greff Rupesh K. Srivastava Jan Koutník Bas R. Steunebrink Jürgen Schmidhuber
+	A Simple Way to Initialize Recurrent Networks of Rectified Linear Units	2015	Quoc V. Le Navdeep Jaitly Geoffrey E. Hinton
+	Convolutional Neural Networks for Sentence Classification	2014	Yoon Kim
+	Deep Convolutional Networks are Hierarchical Kernel Machines	2015	Fabio Anselmi Lorenzo Rosasco Cheston Tan Tomaso Poggio
+ PDF Chat	Effective Approaches to Attention-based Neural Machine Translation	2015	Thang Luong Hieu Pham Christopher D. Manning
+	Character-Aware Neural Language Models	2015	Yoon Kim Yacine Jernite David Sontag Alexander M. Rush
+	On Using Very Large Target Vocabulary for Neural Machine Translation	2015	Sébastien Jean Kyunghyun Cho Roland Memisevic Yoshua Bengio