Prefix Conditioning Unifies Language and Label Supervision
Prefix Conditioning Unifies Language and Label Supervision
Pretraining visual models on web-scale image-caption datasets has recently emerged as a powerful alternative to traditional pretraining on image classification data. Image-caption datasets are more "opendomain ", containing broader scene types and vocabulary words, and result in models that have strong performance in fewand zero-shot recognition tasks. However large-scale classification …