CLIP-Lite: Information Efficient Visual Representation Learning from
Textual Annotations
CLIP-Lite: Information Efficient Visual Representation Learning from
Textual Annotations
We propose CLIP-Lite, an information efficient method for visual representation learning by feature alignment with textual annotations. Compared to the previously proposed CLIP model, CLIP-Lite requires only one negative image-text sample pair for every positive image-text sample during the optimization of its contrastive learning objective. We accomplish this by taking …