Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large
Language Models
Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large
Language Models
Current vision large language models (VLLMs) exhibit remarkable capabilities yet are prone to generate harmful content and are vulnerable to even the simplest jailbreaking attacks. Our initial analysis finds that this is due to the presence of harmful data during vision-language instruction fine-tuning, and that VLLM fine-tuning can cause forgetting …