Filtered Corpus Training (FiCT) Shows that Language Models can
Generalize from Indirect Evidence
Filtered Corpus Training (FiCT) Shows that Language Models can
Generalize from Indirect Evidence
This paper introduces Filtered Corpus Training, a method that trains language models (LMs) on corpora with certain linguistic constructions filtered out from the training data, and uses it to measure the ability of LMs to perform linguistic generalization on the basis of indirect evidence. We apply the method to both …