Ask a Question

Prefer a chat interface with context about you and your work?

AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities

AltCLIP: Altering the Language Encoder in CLIP for Extended Language Capabilities

CLIP (Contrastive Languageā€“Image Pretraining) is an English multimodal representation model learned from a massive amount of English text-image pairs and has achieved great success in various downstream tasks, including image classification, text-to-image retrieval, and image generation. When extending CLIP to other languages, the major problem is the lack of good-quality ā€¦