Can You Learn Semantics Through Next-Word Prediction? The Case of
Entailment
Can You Learn Semantics Through Next-Word Prediction? The Case of
Entailment
Do LMs infer the semantics of text from co-occurrence patterns in their training data? Merrill et al. (2022) argue that, in theory, probabilities predicted by an optimal LM encode semantic information about entailment relations, but it is unclear whether neural LMs trained on corpora learn entailment in this way because …