Ask a Question

Prefer a chat interface with context about you and your work?

Anchor-based Robust Finetuning of Vision-Language Models

Anchor-based Robust Finetuning of Vision-Language Models

We aim at finetuning a vision-language model without hurting its out-of-distribution (OOD) generalization. We address two types of OOD generalization, i.e., i) domain shift such as natural to sketch images, and ii) zero-shot capability to recognize the category that was not contained in the finetune data. Arguably, the diminished OOD …