Implicit Geometry of Next-token Prediction: From Language Sparsity
Patterns to Model Representations
Implicit Geometry of Next-token Prediction: From Language Sparsity
Patterns to Model Representations
Next-token prediction (NTP) over large text corpora has become the go-to paradigm to train large language models. Yet, it remains unclear how NTP influences the mapping of linguistic patterns to geometric properties of the resulting model representations. We frame training of large language models as soft-label classification over sparse probabilistic …