Tianyu Chen

Follow

Generating author description...

Common Coauthors
Commonly Cited References
Action Title Year Authors # of times referenced
+ PDF Chat Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books 2015 Yukun Zhu
Ryan Kiros
Rich Zemel
Ruslan Salakhutdinov
Raquel Urtasun
Antonio Torralba
Sanja Fidler
1
+ Cross-lingual Language Model Pretraining 2019 Guillaume Lample
Alexis Conneau
1
+ GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding 2018 Alex Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel Bowman
1
+ PDF Chat SQuAD: 100,000+ Questions for Machine Comprehension of Text 2016 Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
1
+ A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference 2018 Adina Williams
Nikita Nangia
Samuel Bowman
1
+ PDF Chat Neural Network Acceptability Judgments 2019 Alex Warstadt
Amanpreet Singh
Samuel R. Bowman
1
+ BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension 2020 Mike Lewis
Yinhan Liu
Naman Goyal
Marjan Ghazvininejad
Abdelrahman Mohamed
Omer Levy
Veselin Stoyanov
Luke Zettlemoyer
1
+ GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding 2020 Dmitry Lepikhin
HyoukJoong Lee
Yuanzhong Xu
Dehao Chen
Orhan Fırat
Yanping Huang
Maxim Krikun
Noam Shazeer
Zhifeng Chen
1
+ An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale 2020 Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
Thomas Unterthiner
Mostafa Dehghani
Matthias Minderer
Georg Heigold
Sylvain Gelly
1
+ BASE Layers: Simplifying Training of Large, Sparse Models 2021 Mike Lewis
Shruti Bhosale
Tim Dettmers
Naman Goyal
Luke Zettlemoyer
1
+ Hash Layers For Large Sparse Models 2021 Stephen Roller
Sainbayar Sukhbaatar
Arthur Szlam
Jason Weston
1
+ BEiT: BERT Pre-Training of Image Transformers 2021 Hangbo Bao
Dong Li
Furu Wei
1
+ Sparse-MLP: A Fully-MLP Architecture with Conditional Computation. 2021 Yuxuan Lou
Fuzhao Xue
Zangwei Zheng
You Yang
1
+ PDF Chat Tricks for Training Sparse Translation Models 2022 Dheeru Dua
Shruti Bhosale
Vedanuj Goswami
James Cross
Mike Lewis
Angela Fan
1
+ Building a great multi-lingual teacher with sparsely-gated mixture of experts for speech recognition 2021 Kenichi Kumatani
Robert Gmyr
Felipe Cruz Salinas
Linquan Liu
Wei Zuo
Devang Patel
Eric Sun
Yu Shi
1
+ One Student Knows All Experts Know: From Sparse to Dense 2022 Fuzhao Xue
Xiaoxin He
Xiaozhe Ren
Yuxuan Lou
Yang You
1
+ On the Representation Collapse of Sparse Mixture of Experts 2022 Zewen Chi
Dong Li
Shaohan Huang
Damai Dai
Shuming Ma
Barun Patra
Saksham Singhal
Payal Bajaj
Song Xia
Furu Wei
1
+ PDF Chat StableMoE: Stable Routing Strategy for Mixture of Experts 2022 Damai Dai
Li Dong
Shuming Ma
Bo Zheng
Zhifang Sui
Baobao Chang
Furu Wei
1
+ Residual Mixture of Experts 2022 Lemeng Wu
Mengchen Liu
Yinpeng Chen
Dongdong Chen
Xiyang Dai
Lu Yuan
1
+ Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers 2022 Rui Liu
Young Jin Kim
Alexandre Muzio
Barzan Mozafari
Hany Hassan Awadalla
1
+ Task-Specific Expert Pruning for Sparse Mixture-of-Experts 2022 Tianyu Chen
Shaohan Huang
Yuan Xie
Binxing Jiao
Daxin Jiang
Haoyi Zhou
Jianxin Li
Furu Wei
1
+ Scaling Vision with Sparse Mixture of Experts 2021 Carlos Riquelme
Joan Puigcerver
Basil Mustafa
Maxim Neumann
Rodolphe Jenatton
André Susano Pinto
Daniel Keysers
Neil Houlsby
1
+ Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity 2021 William Fedus
Barret Zoph
Noam Shazeer
1
+ Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer 2019 Colin Raffel
Noam Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
1
+ Language Models are Few-Shot Learners 2020 T. B. Brown
Benjamin F. Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
Prafulla Dhariwal
Arvind Neelakantan
Pranav Shyam
Girish Sastry
Amanda Askell
1
+ Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer 2017 Noam Shazeer
Azalia Mirhoseini
Krzysztof Maziarz
Andrew R. Davis
Quoc V. Le
Geoffrey E. Hinton
Jeff Dean
1
+ PDF Chat SemEval-2017 Task 1: Semantic Textual Similarity Multilingual and Cross-lingual Focused Evaluation 2017 Daniel Cer
Mona Diab
Eneko Agirre
Iñigo López-Gazpio
Lucia Specia
1
+ Attention Is All You Need 2017 Ashish Vaswani
Noam Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan N. Gomez
Łukasz Kaiser
Illia Polosukhin
1