Projects
Reading
People
Chat
SU\G
(đ¸)
/K¡U
Projects
Reading
People
Chat
Sign Up
Light
Dark
System
Szymon Tworkowski
Follow
Share
Generating author description...
All published works
Action
Title
Year
Authors
+
PDF
Chat
Hierarchical Transformers Are More Efficient Language Models
2022
Piotr Nawrot
Szymon Tworkowski
MichaĹ Tyrolski
Ĺukasz Kaiser
Yuhuai Wu
Christian Szegedy
Henryk Michalewski
+
Hierarchical Transformers Are More Efficient Language Models
2021
Piotr Nawrot
Szymon Tworkowski
MichaĹ Tyrolski
Ĺukasz Kaiser
Yuhuai Wu
Christian Szegedy
Henryk Michalewski
Common Coauthors
Coauthor
Papers Together
Christian Szegedy
2
Piotr Nawrot
2
Henryk Michalewski
2
Yuhuai Wu
2
Ĺukasz Kaiser
2
MichaĹ Tyrolski
2
Commonly Cited References
Action
Title
Year
Authors
# of times referenced
+
Conditional Image Generation with PixelCNN Decoders
2016
Aäron van den Oord
Nal Kalchbrenner
Oriol Vinyals
Lasse Espeholt
Alex Graves
Koray Kavukcuoglu
1
+
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
2018
Jacob Devlin
MingâWei Chang
Kenton Lee
Kristina Toutanova
1
+
Generating Long Sequences with Sparse Transformers.
2019
Rewon Child
Scott Gray
Alec Radford
Ilya Sutskever
1
+
Adaptive Attention Span in Transformers
2019
Sainbayar Sukhbaatar
Ădouard Grave
Piotr Bojanowski
Armand Joulin
1
+
Pixel Recurrent Neural Networks
2016
Aäron van den Oord
Nal Kalchbrenner
Koray Kavukcuoglu
1
+
Transformer-XL: Attentive Language Models beyond a Fixed-Length Context
2019
Zihang Dai
Zhilin Yang
Yiming Yang
Jaime Carbonell
Quoc V. Le
Ruslan Salakhutdinov
1
+
PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
2019
Jingqing Zhang
Yao Zhao
Mohammad Saleh
Peter J. Liu
1
+
Axial Attention in Multidimensional Transformers
2019
Jonathan Ho
Nal Kalchbrenner
Dirk Weissenborn
Tim Salimans
1
+
Scaling Laws for Neural Language Models
2020
Jared Kaplan
Sam McCandlish
Tom Henighan
T. B. Brown
Benjamin Chess
Rewon Child
Scott Gray
Alec Radford
Jeffrey Wu
Dario Amodei
1
+
Longformer: The Long-Document Transformer
2020
Iz Beltagy
Matthew E. Peters
Arman Cohan
1
+
Jukebox: A Generative Model for Music
2020
Prafulla Dhariwal
Heewoo Jun
Christine Payne
Jong Wook Kim
Alec Radford
Ilya Sutskever
1
+
Multi-scale Transformer Language Models
2020
Sandeep Subramanian
Ronan Collobert
MarcâAurelio Ranzato
Y-Lan Boureau
1
+
Language Models are Few-Shot Learners
2020
T. B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
Prafulla Dhariwal
Arvind Neelakantan
Pranav Shyam
Girish Sastry
Amanda Askell
1
+
Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
2020
Zihang Dai
Guokun Lai
Yiming Yang
Quoc V. Le
1
+
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
2020
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
François Fleuret
1
+
Addressing Some Limitations of Transformers with Feedback Memory
2020
Angela Fan
Thibaut Lavril
Ădouard Grave
Armand Joulin
Sainbayar Sukhbaatar
1
+
PDF
Chat
Efficient Content-Based Sparse Attention with Routing Transformers
2021
Aurko Roy
Mohammad Saffar
Ashish Vaswani
David Grangier
1
+
PDF
Chat
<scp>Canine</scp>: Pre-training an Efficient Tokenization-Free Encoder for Language Representation
2022
Jonathan H. Clark
Dan Garrette
Iulia Turc
John Wieting
1
+
RoFormer: Enhanced Transformer with Rotary Position Embedding
2021
Jianlin Su
Yu Lu
Shengfeng Pan
Bo Wen
Yunfeng Liu
1
+
PDF
Chat
ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models
2022
Linting Xue
Aditya Barua
Noah Constant
Rami AlâRfou
Sharan Narang
Mihir Kale
Adam P. Roberts
Colin Raffel
1
+
Not All Memories are Created Equal: Learning to Forget by Expiring
2021
Sainbayar Sukhbaatar
Da Young Ju
Spencer Poff
Stephen Roller
Arthur Szlam
Jason Weston
Angela Fan
1
+
Score Matching Model for Unbounded Data Score
2021
DongâJun Kim
Seungjae Shin
Kyungwoo Song
Wanmo Kang
IlâChul Moon
1
+
Charformer: Fast Character Transformers via Gradient-based Subword Tokenization
2021
Yi Tay
Vinh Q. Tran
Sebastian Ruder
Jai Prakash Gupta
Hyung Won Chung
Dara Bahri
Zhen Qin
Simon Baumgartner
Cong Yu
Donald Metzler
1
+
Variational Diffusion Models
2021
Diederik P. Kingma
Tim Salimans
Ben Poole
Jonathan Ho
1
+
Evaluating Large Language Models Trained on Code
2021
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique PondĂŠ de Oliveira Pinto
Jared Kaplan
Harrison Edwards
Yuri Burda
Nicholas Joseph
Greg Brockman
1
+
Combiner: Full Attention Transformer with Sparse Computation Cost
2021
Hongyu Ren
Hanjun Dai
Zihang Dai
Mengjiao Yang
Jure Leskovec
Dale Schuurmans
Bo Dai
1
+
Long-Short Transformer: Efficient Transformers for Language and Vision
2021
Chen Zhu
Wei Ping
Chaowei Xiao
Mohammad Shoeybi
Tom Goldstein
Anima Anandkumar
Bryan Catanzaro
1
+
Densely connected normalizing flows
2021
Matej GrciÄ
Ivan GrubiĹĄiÄ
SiniĹĄa Ĺ egviÄ
1
+
Jukebox: A Generative Model for Music
2020
Prafulla Dhariwal
Heewoo Jun
Christine Payne
Jong Wook Kim
Alec Radford
Ilya Sutskever
1
+
Language Models are Few-Shot Learners
2020
T. B. Brown
Benjamin F. Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
Prafulla Dhariwal
Arvind Neelakantan
Pranav Shyam
Girish Sastry
Amanda Askell
1
+
Rethinking Attention with Performers
2020
Krzysztof ChoromaĹski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
TamĂĄs SarlĂłs
Peter Hawkins
Jared Davis
Afroz Mohiuddin
Ĺukasz Kaiser
1
+
Generating Long Sequences with Sparse Transformers
2019
Rewon Child
Scott Gray
Alec Radford
Ilya Sutskever
1
+
Attention Is All You Need
2017
Ashish Vaswani
Noam Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan N. Gomez
Ĺukasz Kaiser
Illia Polosukhin
1
+
PDF
Chat
RoFormer: Enhanced transformer with Rotary Position Embedding
2023
Jianlin Su
Murtadha Ahmed
Yu Lu
Shengfeng Pan
Bo Wen
Yunfeng Liu
1