Sharing Attention Weights for Fast Transformer

Type: Article

Publication Date: 2019-07-28

Citations: 37

DOI: https://doi.org/10.24963/ijcai.2019/735

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ PDF Chat Accelerating Neural Transformer via an Average Attention Network 2018 Biao Zhang
Deyi Xiong
Jinsong Su
+ Multi-Unit Transformers for Neural Machine Translation 2020 Jianhao Yan
Fandong Meng
Jie Zhou
+ Multi-Unit Transformers for Neural Machine Translation 2020 Jianhao Yan
Fandong Meng
Jie Zhou
+ Attention Link: An Efficient Attention-Based Low Resource Machine Translation Architecture 2023 Zeping Min
+ PDF Chat Efficient Inference For Neural Machine Translation 2020 Yi‐Te Hsu
Sarthak Garg
Yi-Hsiu Liao
Ilya Chatsviorkin
+ DeLighT: Deep and Light-weight Transformer 2020 Sachin Mehta
Marjan Ghazvininejad
Srinivasan Iyer
Luke Zettlemoyer
Hannaneh Hajishirzi
+ An Efficient Transformer Decoder with Compressed Sub-layers 2021 Yanyang Li
Ye Lin
Tong Xiao
Jingbo Zhu
+ PDF Chat An Efficient Transformer Decoder with Compressed Sub-layers 2021 Yanyang Li
Lin Ye
Tong Xiao
Jingbo Zhu
+ Adding Interpretable Attention to Neural Translation Models Improves Word Alignment 2019 Thomas Zenkel
Joern Wuebker
John DeNero
+ Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation 2020 Alessandro Raganato
Yves Scherrer
Jörg Tiedemann
+ PDF Chat Training Deeper Neural Machine Translation Models with Transparent Attention 2018 Ankur Bapna
Mia Chen
Orhan Fırat
Yuan Cao
Yonghui Wu
+ Training Deeper Neural Machine Translation Models with Transparent Attention 2018 Ankur Bapna
Mia Xu Chen
Orhan Fırat
Yuan Cao
Yonghui Wu
+ Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision 2021 Chenyang Huang
Hao Zhou
Osmar R. Zaı̈ane
Lili Mou
Lei Li
+ PDF Chat Non-autoregressive Translation with Layer-Wise Prediction and Deep Supervision 2022 Chenyang Huang
Hao Zhou
Osmar R. Zaı̈ane
Lili Mou
Lei Li
+ Shallow-to-Deep Training for Neural Machine Translation 2020 Bei Li
Ziyang Wang
Hui Liu
Yufan Jiang
Quan Du
Tong Xiao
Huizhen Wang
Jingbo Zhu
+ Shallow-to-Deep Training for Neural Machine Translation 2020 Bei Li
Ziyang Wang
Hui Liu
Yufan Jiang
Quan Du
Tong Xiao
Huizhen Wang
Jingbo Zhu
+ DeLighT: Very Deep and Light-weight Transformer 2020 Sachin Mehta
Marjan Ghazvininejad
Srinivasan Iyer
Luke Zettlemoyer
Hannaneh Hajishirzi
+ PDF Chat Relaxed Attention for Transformer Models 2023 Timo Lohrenz
BjĂžrn MĂžller
Zhengyang Li
Tim Fingscheidt
+ On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation. 2018 Tamer Alkhouli
Gabriel Bretschner
Hermann Ney
+ On The Alignment Problem In Multi-Head Attention-Based Neural Machine Translation 2018 Tamer Alkhouli
Gabriel Bretschner
Hermann Ney

Cited by (24)

Action Title Year Authors
+ Shallow-to-Deep Training for Neural Machine Translation 2020 Bei Li
Ziyang Wang
Hui Liu
Yufan Jiang
Quan Du
Tong Xiao
Huizhen Wang
Jingbo Zhu
+ Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation 2023 Kangwook Jang
Sungnyun Kim
Se-Young Yun
Hoirin Kim
+ Shallow-to-Deep Training for Neural Machine Translation 2020 Bei Li
Ziyang Wang
Hui Liu
Yufan Jiang
Quan Du
Tong Xiao
Huizhen Wang
Jingbo Zhu
+ PDF Chat Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation 2023 Kangwook Jang
Sungnyun Kim
Se-Young Yun
Hoirin Kim
+ PDF Chat ACORT: A compact object relation transformer for parameter efficient image captioning 2022 Jia Huei Tan
Ying Hua Tan
Chee Seng Chan
Joon Huang Chuah
+ Lessons on Parameter Sharing across Layers in Transformers 2023 Sho Takase
Shun Kiyono
+ Only 5% Attention Is All You Need: Efficient Long-range Document-level Neural Machine Translation 2023 Zihan Liu
Zewei Sun
Shanbo Cheng
Shujian Huang
Mingxuan Wang
+ PDF Chat CTformer: convolution-free Token2Token dilated vision transformer for low-dose CT denoising 2023 Dayang Wang
Fenglei Fan
Zhan Wu
Rui Liu
Fei Wang
Hengyong Yu
+ PDF Chat Learning Light-Weight Translation Models from Deep Transformer 2021 Bei Li
Ziyang Wang
Hui Liu
Quan Du
Tong Xiao
Chunliang Zhang
Jingbo Zhu
+ PDF Chat Audio Albert: A Lite Bert for Self-Supervised Learning of Audio Representation 2021 Po-Han Chi
Pei-Hung Chung
Tsung-Han Wu
Chun-Cheng Hsieh
Yen‐Hao Chen
Shang-Wen Li
Hung-yi Lee
+ PDF Chat An Efficient Transformer Decoder with Compressed Sub-layers 2021 Yanyang Li
Lin Ye
Tong Xiao
Jingbo Zhu
+ Towards Fully 8-bit Integer Inference for the Transformer Model 2020 Ye Lin
Yanyang Li
Tengbo Liu
Tong Xiao
Tongran Liu
Jingbo Zhu
+ Bag of Tricks for Optimizing Transformer Efficiency 2021 Ye Lin
Yanyang Li
Tong Xiao
Jingbo Zhu
+ Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio Representation 2020 Po-Han Chi
Pei-Hung Chung
Tsung-Han Wu
Chun-Cheng Hsieh
Yen‐Hao Chen
Shang-Wen Li
Hung-yi Lee
+ The Cascade Transformer: an Application for Efficient Answer Sentence Selection 2020 Luca Soldaini
Alessandro Moschitti
+ OH-Former: Omni-Relational High-Order Transformer for Person Re-Identification 2021 Xianing Chen
Jialang Xu
Jiale Xu
Shenghua Gao
+ Fast offline Transformer-based end-to-end automatic speech recognition for real-world applications. 2021 Yoo Rhee Oh
Kiyoung Park
Jeon Gyu Park
+ Bag of Tricks for Optimizing Transformer Efficiency 2021 Ye Lin
Yanyang Li
Tong Xiao
Jingbo Zhu
+ Towards Fully 8-bit Integer Inference for the Transformer Model 2020 Ye Lin
Yanyang Li
Tengbo Liu
Tong Xiao
Tongran Liu
Jingbo Zhu
+ The Cascade Transformer: an Application for Efficient Answer Sentence Selection 2020 Luca Soldaini
Alessandro Moschitti
+ Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers 2020 Krzysztof ChoromaƄski
Valerii Likhosherstov
David Dohan
Xingyou Song
Jared Quincy Davis
TamĂĄs SarlĂłs
David Belanger
Lucy J. Colwell
Adrian Weller
+ Lessons on Parameter Sharing across Layers in Transformers 2021 Sho Takase
Shun Kiyono
+ PDF Chat Fast offline transformer‐based end‐to‐end automatic speech recognition for real‐world applications 2021 Yoo Rhee Oh
Kiyoung Park
Jeon Gue Park
+ Bag of Tricks for Optimizing Transformer Efficiency 2021 Lin Ye
Yanyang Li
Tong Xiao
Jingbo Zhu

Citing (19)

Action Title Year Authors
+ Distilling the Knowledge in a Neural Network 2015 Geoffrey E. Hinton
Oriol Vinyals
Jay B. Dean
+ PDF Chat Effective Approaches to Attention-based Neural Machine Translation 2015 Thang Luong
Hieu Pham
Christopher D. Manning
+ Sequence to Sequence Learning with Neural Networks 2014 Ilya Sutskever
Oriol Vinyals
Quoc V. Le
+ Neural Machine Translation by Jointly Learning to Align and Translate 2014 Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio
+ PDF Chat Rethinking the Inception Architecture for Computer Vision 2016 Christian Szegedy
Vincent Vanhoucke
Sergey Ioffe
Jon Shlens
Zbigniew Wojna
+ Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation 2016 Yonghui Wu
Mike Schuster
Zhifeng Chen
Quoc V. Le
Mohammad Norouzi
Wolfgang Macherey
Maxim Krikun
Yuan Cao
Qin Gao
Klaus Macherey
+ Vocabulary Selection Strategies for Neural Machine Translation 2016 Gurvan L'Hostis
David Grangier
Michael Auli
+ PDF Chat Massive Exploration of Neural Machine Translation Architectures 2017 Denny Britz
Anna Goldie
Minh-Thang Luong
Quoc V. Le
+ Convolutional Sequence to Sequence Learning 2017 Jonas Gehring
Michael Auli
David Grangier
Denis Yarats
Yann Dauphin
+ Attention-based Vocabulary Selection for NMT Decoding 2017 Baskaran Sankaran
Markus Freitag
Yaser Al-Onaizan
+ Mixed Precision Training 2017 Paulius Micikevicius
Sharan Narang
Jonah Alben
Gregory Diamos
Erich Elsen
David GarcĂ­a
Boris Ginsburg
Michael Houston
Oleksii Kuchaiev
Ganesh Venkatesh
+ Non-Autoregressive Neural Machine Translation 2017 Jiatao Gu
James Bradbury
Caiming Xiong
Victor O. K. Li
Richard Socher
+ Pieces of Eight: 8-bit Neural Machine Translation 2018 Jerry Quinn
Miguel Ballesteros
+ PDF Chat Sequence-Level Knowledge Distillation 2016 Yoon Kim
Alexander M. Rush
+ Multi-task Sequence to Sequence Learning 2015 Minh-Thang Luong
Quoc V. Le
Ilya Sutskever
Oriol Vinyals
Ɓukasz Kaiser
+ PDF Chat Recurrent Stacking of Layers for Compact Neural Machine Translation Models 2019 Raj Dabre
Atsushi Fujita
+ PDF Chat Unsupervised Neural Machine Translation with Weight Sharing 2018 Zhen Yang
Wei Chen
Feng Wang
Bo Xu
+ PDF Chat Accelerating Neural Transformer via an Average Attention Network 2018 Biao Zhang
Deyi Xiong
Jinsong Su
+ Attention Is All You Need 2017 Ashish Vaswani
Noam Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan N. Gomez
Ɓukasz Kaiser
Illia Polosukhin