Character-Level Language Modeling with Deeper Self-Attention

Rami Al‐Rfou, Dokook Choe, Noah Constant, Mandy Guo, Llion Jones

Type: Preprint

Publication Date: 2018-01-01

Citations: 4

DOI: https://doi.org/10.48550/arxiv.1808.04444

View Publication

Locations

arXiv (Cornell University) - View - PDF
DataCite API - View

Similar Works

Action	Title	Year	Authors
+	Character-Level Language Modeling with Deeper Self-Attention	2018	Rami Al‐Rfou Dokook Choe Noah Constant Mandy Guo Llion Jones
+ PDF Chat	Character-Level Language Modeling with Deeper Self-Attention	2019	Rami Al‐Rfou Dokook Choe Noah Constant Mandy Guo Llion Jones
+	Word-Level Representation From Bytes For Language Modeling	2022	Chu-Tak Lee Qipeng Guo Xipeng Qiu
+	Learning to Look Inside: Augmenting Token-Based Encoders with Character-Level Information	2021	Yuval Pinter Amanda Stent Mark Dredze Jacob Eisenstein
+	Mogrifier LSTM	2019	Gábor Melis Tomáš Kočiský Phil Blunsom
+	Mogrifier LSTM.	2019	Gábor Melis Tomáš Kočiský Phil Blunsom
+	Bridging the Gap for Tokenizer-Free Language Models	2019	Dokook Choe Rami Al‐Rfou Mandy Guo Heeyoung Lee Noah Constant
+	Adaptive Attention Span in Transformers	2019	Sainbayar Sukhbaatar Édouard Grave Piotr Bojanowski Armand Joulin
+ PDF Chat	When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute	2021	Tao Leí
+	When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute	2021	Tao Lei
+	When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute	2021	Tao Leí
+	Transformer-XL: Attentive Language Models beyond a Fixed-Length Context	2019	Zihang Dai Zhilin Yang Yiming Yang Jaime Carbonell Quoc V. Le Ruslan Salakhutdinov
+	Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context	2019	Zihang Dai Zhilin Yang Yiming Yang Jaime Carbonell Quoc V. Le Ruslan Salakhutdinov
+	MANTa: Efficient Gradient-Based Tokenization for Robust End-to-End Language Modeling	2022	Nathan Godey Roman Castagné Éric Villemonte de la Clergerie Benoît Sagot
+	What is the best recipe for character-level encoder-only modelling?	2023	Kris Cao
+ PDF Chat	Models In a Spelling Bee: Language Models Implicitly Learn the Character Composition of Tokens	2022	Itay Itzhak Omer Levy
+	Models In a Spelling Bee: Language Models Implicitly Learn the Character Composition of Tokens	2021	Itay Itzhak Omer Levy
+	Augmenting Self-attention with Persistent Memory	2019	Sainbayar Sukhbaatar Édouard Grave Guillaume Lample Hervé Jeǵou Armand Joulin
+	Augmenting Self-attention with Persistent Memory	2019	Sainbayar Sukhbaatar Édouard Grave Guillaume Lample Hervé Jeǵou Armand Joulin
+	Learn Your Tokens: Word-Pooled Tokenization for Language Modeling	2023	Avijit Thawani Saurabh Ghanekar Xiaoyuan Zhu Jay Pujara

Works That Cite This (1)

Action	Title	Year	Authors
+ PDF Chat	End-to-End Entity Detection with Proposer and Regressor	2023	Xueru Wen Changjiang Zhou Haotian Tang Luguang Liang Hong Qi Yu Jiang

Works Cited by This (0)

Action	Title	Year	Authors