Ask a Question

Prefer a chat interface with context about you and your work?

Can adversarial attacks by large language models be attributed?

Can adversarial attacks by large language models be attributed?

Attributing outputs from Large Language Models (LLMs) in adversarial settings-such as cyberattacks and disinformation-presents significant challenges that are likely to grow in importance. We investigate this attribution problem using formal language theory, specifically language identification in the limit as introduced by Gold and extended by Angluin. By modeling LLM outputs …