Szymon Tworkowski

Follow

Generating author description...

Common Coauthors
Commonly Cited References
Action Title Year Authors # of times referenced
+ Conditional Image Generation with PixelCNN Decoders 2016 Aäron van den Oord
Nal Kalchbrenner
Oriol Vinyals
Lasse Espeholt
Alex Graves
Koray Kavukcuoglu
1
+ BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 2018 Jacob Devlin
Ming‐Wei Chang
Kenton Lee
Kristina Toutanova
1
+ Generating Long Sequences with Sparse Transformers. 2019 Rewon Child
Scott Gray
Alec Radford
Ilya Sutskever
1
+ Adaptive Attention Span in Transformers 2019 Sainbayar Sukhbaatar
Édouard Grave
Piotr Bojanowski
Armand Joulin
1
+ Pixel Recurrent Neural Networks 2016 Aäron van den Oord
Nal Kalchbrenner
Koray Kavukcuoglu
1
+ Transformer-XL: Attentive Language Models beyond a Fixed-Length Context 2019 Zihang Dai
Zhilin Yang
Yiming Yang
Jaime Carbonell
Quoc V. Le
Ruslan Salakhutdinov
1
+ PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization 2019 Jingqing Zhang
Yao Zhao
Mohammad Saleh
Peter J. Liu
1
+ Axial Attention in Multidimensional Transformers 2019 Jonathan Ho
Nal Kalchbrenner
Dirk Weissenborn
Tim Salimans
1
+ Scaling Laws for Neural Language Models 2020 Jared Kaplan
Sam McCandlish
Tom Henighan
T. B. Brown
Benjamin Chess
Rewon Child
Scott Gray
Alec Radford
Jeffrey Wu
Dario Amodei
1
+ Longformer: The Long-Document Transformer 2020 Iz Beltagy
Matthew E. Peters
Arman Cohan
1
+ Jukebox: A Generative Model for Music 2020 Prafulla Dhariwal
Heewoo Jun
Christine Payne
Jong Wook Kim
Alec Radford
Ilya Sutskever
1
+ Multi-scale Transformer Language Models 2020 Sandeep Subramanian
Ronan Collobert
Marc’Aurelio Ranzato
Y-Lan Boureau
1
+ Language Models are Few-Shot Learners 2020 T. B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
Prafulla Dhariwal
Arvind Neelakantan
Pranav Shyam
Girish Sastry
Amanda Askell
1
+ Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing 2020 Zihang Dai
Guokun Lai
Yiming Yang
Quoc V. Le
1
+ Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention 2020 Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
François Fleuret
1
+ Addressing Some Limitations of Transformers with Feedback Memory 2020 Angela Fan
Thibaut Lavril
Édouard Grave
Armand Joulin
Sainbayar Sukhbaatar
1
+ PDF Chat Efficient Content-Based Sparse Attention with Routing Transformers 2021 Aurko Roy
Mohammad Saffar
Ashish Vaswani
David Grangier
1
+ PDF Chat <scp>Canine</scp>: Pre-training an Efficient Tokenization-Free Encoder for Language Representation 2022 Jonathan H. Clark
Dan Garrette
Iulia Turc
John Wieting
1
+ RoFormer: Enhanced Transformer with Rotary Position Embedding 2021 Jianlin Su
Yu Lu
Shengfeng Pan
Bo Wen
Yunfeng Liu
1
+ PDF Chat ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models 2022 Linting Xue
Aditya Barua
Noah Constant
Rami Al‐Rfou
Sharan Narang
Mihir Kale
Adam P. Roberts
Colin Raffel
1
+ Not All Memories are Created Equal: Learning to Forget by Expiring 2021 Sainbayar Sukhbaatar
Da Young Ju
Spencer Poff
Stephen Roller
Arthur Szlam
Jason Weston
Angela Fan
1
+ Score Matching Model for Unbounded Data Score 2021 Dong‐Jun Kim
Seungjae Shin
Kyungwoo Song
Wanmo Kang
Il‐Chul Moon
1
+ Charformer: Fast Character Transformers via Gradient-based Subword Tokenization 2021 Yi Tay
Vinh Q. Tran
Sebastian Ruder
Jai Prakash Gupta
Hyung Won Chung
Dara Bahri
Zhen Qin
Simon Baumgartner
Cong Yu
Donald Metzler
1
+ Variational Diffusion Models 2021 Diederik P. Kingma
Tim Salimans
Ben Poole
Jonathan Ho
1
+ Evaluating Large Language Models Trained on Code 2021 Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique PondĂŠ de Oliveira Pinto
Jared Kaplan
Harrison Edwards
Yuri Burda
Nicholas Joseph
Greg Brockman
1
+ Combiner: Full Attention Transformer with Sparse Computation Cost 2021 Hongyu Ren
Hanjun Dai
Zihang Dai
Mengjiao Yang
Jure Leskovec
Dale Schuurmans
Bo Dai
1
+ Long-Short Transformer: Efficient Transformers for Language and Vision 2021 Chen Zhu
Wei Ping
Chaowei Xiao
Mohammad Shoeybi
Tom Goldstein
Anima Anandkumar
Bryan Catanzaro
1
+ Densely connected normalizing flows 2021 Matej Grcić
Ivan Grubišić
Siniša Šegvić
1
+ Jukebox: A Generative Model for Music 2020 Prafulla Dhariwal
Heewoo Jun
Christine Payne
Jong Wook Kim
Alec Radford
Ilya Sutskever
1
+ Language Models are Few-Shot Learners 2020 T. B. Brown
Benjamin F. Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
Prafulla Dhariwal
Arvind Neelakantan
Pranav Shyam
Girish Sastry
Amanda Askell
1
+ Rethinking Attention with Performers 2020 Krzysztof Choromański
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
TamĂĄs SarlĂłs
Peter Hawkins
Jared Davis
Afroz Mohiuddin
Łukasz Kaiser
1
+ Generating Long Sequences with Sparse Transformers 2019 Rewon Child
Scott Gray
Alec Radford
Ilya Sutskever
1
+ Attention Is All You Need 2017 Ashish Vaswani
Noam Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan N. Gomez
Łukasz Kaiser
Illia Polosukhin
1
+ PDF Chat RoFormer: Enhanced transformer with Rotary Position Embedding 2023 Jianlin Su
Murtadha Ahmed
Yu Lu
Shengfeng Pan
Bo Wen
Yunfeng Liu
1