Ask AI a math question

Related Paper

Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations

Next-token prediction (NTP) over large text corpora has become the go-to paradigm to train large language models. Yet, it remains unclear how NTP influences the mapping of linguistic patterns to geometric properties of the resulting model representations. We frame training of large language models as soft-label classification over sparse probabilistic …

Ask a Question