Intent-conditioned and Non-toxic Counterspeech Generation using
Multi-Task Instruction Tuning with RLAIF
Intent-conditioned and Non-toxic Counterspeech Generation using
Multi-Task Instruction Tuning with RLAIF
Counterspeech, defined as a response to mitigate online hate speech, is increasingly used as a non-censorial solution. Addressing hate speech effectively involves dispelling the stereotypes, prejudices, and biases often subtly implied in brief, single-sentence statements or abuses. These implicit expressions challenge language models, especially in seq2seq tasks, as model performance …