Ask a Question

Prefer a chat interface with context about you and your work?

Optimizing Adaptive Attacks against Content Watermarks for Language Models

Optimizing Adaptive Attacks against Content Watermarks for Language Models

Large Language Models (LLMs) can be \emph{misused} to spread online spam and misinformation. Content watermarking deters misuse by hiding a message in model-generated outputs, enabling their detection using a secret watermarking key. Robustness is a core security property, stating that evading detection requires (significant) degradation of the content's quality. Many …