Ask a Question

Prefer a chat interface with context about you and your work?

On Trojans in Refined Language Models

On Trojans in Refined Language Models

A Trojan in a language model can be inserted when the model is refined for a particular application such as determining the sentiment of product reviews. In this paper, we clarify and empirically explore variations of the data-poisoning threat model. We then empirically assess two simple defenses each for a …