Using Pre-Trained Language Models for Producing Counter Narratives Against Hate Speech: a Comparative Study

Serra Sinem Tekiroğlu, Helena Bonaldi, Margherita Fanton, Marco Guerini

Type: Article

Publication Date: 2022-01-01

Citations: 7

DOI: https://doi.org/10.18653/v1/2022.findings-acl.245

Abstract

In this work, we present an extensive study on the use of pre-trained language models for the task of automatic Counter Narrative (CN) generation to fight online hate speech in English. We first present a comparative study to determine whether there is a particular Language Model (or class of LMs) and a particular decoding mechanism that are the most appropriate to generate CNs. Findings show that autoregressive models combined with stochastic decodings are the most promising. We then investigate how an LM performs in generating a CN with regard to an unseen target of hate. We find out that a key element for successful ‘out of target’ experiments is not an overall similarity with the training data but the presence of a specific subset of training data, i. e. a target that shares some commonalities with the test target that can be defined a-priori. We finally introduce the idea of a pipeline based on the addition of an automatic post-editing step to refine generated CNs.

Locations

arXiv (Cornell University) - View - PDF
Findings of the Association for Computational Linguistics: ACL 2022 - View - PDF
Iris (University of Trento) - View - PDF

Similar Works

Action	Title	Year	Authors
+	Using Pre-Trained Language Models for Producing Counter Narratives Against Hate Speech: a Comparative Study	2022	Serra Sinem Tekiroğlu Helena Bonaldi Margherita Fanton Marco Guerini
+	Weigh Your Own Words: Improving Hate Speech Counter Narrative Generation via Attention Regularization	2023	Helena Bonaldi Giuseppe Attanasio Debora Nozza Marco Guerini
+ PDF Chat	Harnessing Artificial Intelligence to Combat Online Hate: Exploring the Challenges and Opportunities of Large Language Models in Hate Speech Detection	2024	Tharindu Kumarage Amrita Bhattacharjee Joshua Garland
+	Generating Counter Narratives against Online Hate Speech: Data and Strategies	2020	Serra Sinem Tekiroğlu Yi-Ling Chung Marco Guerini
+	Generating Counter Narratives against Online Hate Speech: Data and Strategies	2020	Serra Sinem Tekiroğlu Yi-Ling Chung Marco Guerini
+	Generating Counter Narratives against Online Hate Speech: Data and Strategies	2020	Serra Sinem Tekiroğlu Yi-Ling Chung Marco Guerini
+ PDF Chat	An Investigation of Large Language Models for Real-World Hate Speech Detection	2023	Keyan Guo Alexander Hu Jaden Mu Ziheng Shi Ziming Zhao Nishant Vishwamitra Hongxin Hu
+ PDF Chat	Decoding Hate: Exploring Language Models' Reactions to Hate Speech	2024	P Piot Javier Parapar
+ PDF Chat	Detecting Anti-Semitic Hate Speech using Transformer-based Large Language Models	2024	D. Liu Minghao Wang Andrew G. Catlin
+ PDF Chat	Probing Critical Learning Dynamics of PLMs for Hate Speech Detection	2024	Sarah Masud Mohammad Aflah Khan Vikram Goyal Md Shad Akhtar Tanmoy Chakraborty
+	Generative AI for Hate Speech Detection: Evaluation and Findings	2023	Sagi Pendzel Tomer Wullach Amir Adler Einat Minkov
+	HateCheckHIn: Evaluating Hindi Hate Speech Detection Models	2022	Mithun Kumar Das Punyajoy Saha Binny Mathew Animesh Mukherjee
+	Evaluating ChatGPT's Performance for Multilingual and Emoji-based Hate Speech Detection	2023	Mithun Das Saurabh Pandey Animesh Mukherjee
+ PDF Chat	Model-Agnostic Meta-Learning for Multilingual Hate Speech Detection	2023	Md. Rabiul Awal Roy Ka-Wei Lee Eshaan Tanwar Tanmay Garg Tanmoy Chakraborty
+	Model-Agnostic Meta-Learning for Multilingual Hate Speech Detection	2023	Md Rabiul Awal Roy Ka-Wei Lee Eshaan Tanwar Tanmay Garg Tanmoy Chakraborty
+ PDF Chat	Towards Interpretable Hate Speech Detection using Large Language Model-extracted Rationales	2024	Ayushi Nirmal Amrita Bhattacharjee Paras Sheth Huan Liu
+	Probing LLMs for hate speech detection: strengths and vulnerabilities	2023	Sarthak Roy Ashish Harshavardhan Animesh Mukherjee Punyajoy Saha
+	Probing LLMs for hate speech detection: strengths and vulnerabilities	2023	Sarthak Roy A Venkata Harshvardhan Animesh Mukherjee Punyajoy Saha
+ PDF Chat	GPT-HateCheck: Can LLMs Write Better Functional Tests for Hate Speech Detection?	2024	Yiping Jin Leo Wanner Alexander Shvets
+ PDF Chat	Towards Efficient and Explainable Hate Speech Detection via Model Distillation	2024	P Piot Javier Parapar

Works That Cite This (5)

Action	Title	Year	Authors
+ PDF Chat	Human-Machine Collaboration Approaches to Build a Dialogue Dataset for Hate Speech Countering	2022	Helena Bonaldi Sara Dellantonio Serra Sinem Tekiroğlu Marco Guerini
+ PDF Chat	COBRA Frames: Contextual Reasoning about Effects and Harms of Offensive Statements	2023	Xuhui Zhou Hao Zhu Akhila Yerukola Thomas Davidson Jena D. Hwang Swabha Swayamdipta Maarten Sap
+	A systematic review of hate speech automatic detection using natural language processing	2023	Md Saroar Jahan Mourad Oussalah
+	Response Generation in Longitudinal Dialogues: Which Knowledge Representation Helps?	2023	Seyed Mahed Mousavi Simone Caldarella Giuseppe Riccardi
+	Response Generation in Longitudinal Dialogues: Which Knowledge Representation Helps?	2023	Seyed Mahed Mousavi Simone Caldarella Giuseppe Riccardi

Works Cited by This (31)

Action	Title	Year	Authors
+	Automated Postediting of Documents	1994	Kevin Knight Ishwar Chander
+ PDF Chat	Challenges in Data-to-Document Generation	2017	Sam Wiseman Stuart M. Shieber Alexander M. Rush
+	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding	2018	Jacob Devlin Ming‐Wei Chang Kenton Lee Kristina Toutanova
+	Analyzing the hate and counter speech accounts on Twitter	2018	Binny Mathew N. Ravi Kumar Ravina Ravina Pawan Goyal Animesh Mukherjee
+	The Curious Case of Neural Text Degeneration	2019	Ari Holtzman Jan Buys Li Du Maxwell Forbes Yejin Choi
+ PDF Chat	Hierarchical Neural Story Generation	2018	Angela Fan Mike Lewis Yann Dauphin
+ PDF Chat	Deep Reinforcement Learning for Dialogue Generation	2016	Jiwei Li Will Monroe Alan Ritter Dan Jurafsky Michel Galley Jianfeng Gao
+ PDF Chat	Thou Shalt Not Hate: Countering Online Hate Speech	2019	Binny Mathew Punyajoy Saha Hardik Tharad Subham Rajgaria Prajwal Singhania Suman Kalyan Maity Pawan Goyal Animesh Mukherjee
+ PDF Chat	Why We Need New Evaluation Metrics for NLG	2017	Jekaterina Novikova Ondřej Dušek Amanda Cercas Curry Verena Rieser
+	A Benchmark Dataset for Learning to Intervene in Online Hate Speech	2019	Jing Qian Anna Bethke Yinyin Liu Elizabeth Belding William Yang Wang