Ask a Question

Prefer a chat interface with context about you and your work?

On Evaluating the Durability of Safeguards for Open-Weight LLMs

On Evaluating the Durability of Safeguards for Open-Weight LLMs

Stakeholders -- from model developers to policymakers -- seek to minimize the dual-use risks of large language models (LLMs). An open challenge to this goal is whether technical safeguards can impede the misuse of LLMs, even when models are customizable via fine-tuning or when model weights are fully open. In …