Ask a Question

Prefer a chat interface with context about you and your work?

BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks

BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks

Despite their superb multimodal capabilities, Vision-Language Models (VLMs) have been shown to be vulnerable to jailbreak attacks, which are inference-time attacks that induce the model to output harmful responses with tricky prompts. It is thus essential to defend VLMs against potential jailbreaks for their trustworthy deployment in real-world applications. In …