Large language models implicitly learn to straighten neural sentence trajectories to construct a predictive representation of natural language

Type: Preprint

Publication Date: 2023-01-01

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2311.04930

Locations

  • arXiv (Cornell University) - View
  • DataCite API - View

Similar Works

Action Title Year Authors
+ Long Distance Relationships without Time Travel: Boosting the Performance of a Sparse Predictive Autoencoder in Sequence Modeling 2019 Jeremy W. Gordon
David Rawlinson
Subutai Ahmad
+ Long Distance Relationships without Time Travel: Boosting the Performance of a Sparse Predictive Autoencoder in Sequence Modeling 2019 Jeremy Gordon
David Rawlinson
Subutai Ahmad
+ PDF Chat On the Origins of Linear Representations in Large Language Models 2024 Yibo Jiang
Goutham Rajendran
Pradeep Ravikumar
Bryon Aragam
Victor Veitch
+ A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks 2021 Nikunj Saunshi
Sadhika Malladi
Sanjeev Arora
+ Recoding latent sentence representations -- Dynamic gradient-based activation modification in RNNs 2021 Dennis Ulmer
+ PDF Chat Predict the Next Word: <Humans exhibit uncertainty in this task and language models _____> 2024 Evgenia Ilia
Wilker Aziz
+ A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks 2020 Nikunj Saunshi
Sadhika Malladi
Sanjeev Arora
+ Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve 2023 R. Thomas McCoy
Shunyu Yao
Dan Friedman
Matthew Hardy
Thomas L. Griffiths
+ Top-down Tree Long Short-Term Memory Networks 2015 Xingxing Zhang
Liang Lu
Mirella Lapata
+ Characterizing Learning Curves During Language Model Pre-Training: Learning, Forgetting, and Stability 2023 Tyler A. Chang
Zhuowen Tu
Benjamin Bergen
+ A Targeted Assessment of Incremental Processing in Neural Language Models and Humans 2021 Ethan Wilcox
Pranali Vani
Roger Lévy
+ PDF Chat Top-down Tree Long Short-Term Memory Networks 2016 Xingxing Zhang
Liang Lu
Mirella Lapata
+ How Do Neural Sequence Models Generalize? Local and Global Context Cues for Out-of-Distribution Prediction 2021 Anthony Bau
Jacob Andreas
+ How Do Neural Sequence Models Generalize? Local and Global Context Cues for Out-of-Distribution Prediction. 2021 Anthony Bau
Jacob Andreas
+ Gradual Learning of Deep Recurrent Neural Networks. 2017 Ziv Aharoni
Gal Rattner
Haim H. Permuter
+ Fusing Sentence Embeddings Into LSTM-based Autoregressive Language Models 2022 Vilém Zouhar
Marius Mosbach
Dietrich Klakow
+ PDF Chat How Well Can a Long Sequence Model Model Long Sequences? Comparing Architechtural Inductive Biases on Long-Context Abilities 2024 Jerry Huang
+ PDF Chat When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1 2024 R. Thomas McCoy
Shunyu Yao
Dan Friedman
Mathew D. Hardy
Thomas L. Griffiths
+ PDF Chat What are large language models supposed to model? 2023 Idan Blank
+ PDF Chat Mechanics of Next Token Prediction with Self-Attention 2024 Yingcong Li
Yixiao Huang
M. Emrullah Ildiz
Ankit Singh Rawat
Samet Oymak

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors