Ask a Question

Prefer a chat interface with context about you and your work?

Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations

Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations

Next-token prediction (NTP) over large text corpora has become the go-to paradigm to train large language models. Yet, it remains unclear how NTP influences the mapping of linguistic patterns to geometric properties of the resulting model representations. We frame training of large language models as soft-label classification over sparse probabilistic …