Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse
Harms in Text-to-Image Generation
Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse
Harms in Text-to-Image Generation
With the rise of text-to-image (T2I) generative AI models reaching wide audiences, it is critical to evaluate model robustness against non-obvious attacks to mitigate the generation of offensive images. By focusing on ``implicitly adversarial'' prompts (those that trigger T2I models to generate unsafe images for non-obvious reasons), we isolate a …