BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against
Jailbreak Attacks
BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against
Jailbreak Attacks
Despite their superb multimodal capabilities, Vision-Language Models (VLMs) have been shown to be vulnerable to jailbreak attacks, which are inference-time attacks that induce the model to output harmful responses with tricky prompts. It is thus essential to defend VLMs against potential jailbreaks for their trustworthy deployment in real-world applications. In …