Computer Science Artificial Intelligence

Hate Speech and Cyberbullying Detection

Description

This cluster of papers focuses on the automated detection of hate speech, offensive language, and cyberbullying in social media platforms such as Twitter. It explores various techniques including machine learning, natural language processing, and deep learning to identify and categorize abusive content, with a specific emphasis on mitigating online harassment and promoting online safety.

Keywords

Hate Speech; Detection; Social Media; Cyberbullying; Offensive Language; Machine Learning; Natural Language Processing; Online Harassment; Twitter; Deep Learning

A large set of email messages, the Enron corpus, was made public during the legal investigation concerning the Enron corporation. This dataset, along with a thorough explanation of its origin, … A large set of email messages, the Enron corpus, was made public during the legal investigation concerning the Enron corporation. This dataset, along with a thorough explanation of its origin, is available at http://www-2.cs.cmu.edu/~enron/. This paper provides a brief introduction and analysis of the dataset. The raw Enron corpus contains 619,446 messages belonging to 158 users. We cleaned the corpus before this analysis by removing certain folders from each user, such as “discussion_threads”. These folders were present for most users, and did not appear to be used directly by the users, but rather were computer generated. Many, such as “all_documents”, also contained large numbers of duplicate emails, which were already present in the users’ other folders. Our goal in this paper is to analyze the suitability of this corpus for exploring how to classify messages as organized by a human, so these folders would have likely been misleading.
The use of “Big Data” in policy and decision making is a current topic of debate. The 2013 murder of Drummer Lee Rigby in Woolwich, London, UK led to an … The use of “Big Data” in policy and decision making is a current topic of debate. The 2013 murder of Drummer Lee Rigby in Woolwich, London, UK led to an extensive public reaction on social media, providing the opportunity to study the spread of online hate speech (cyber hate) on Twitter. Human annotated Twitter data was collected in the immediate aftermath of Rigby's murder to train and test a supervised machine learning text classifier that distinguishes between hateful and/or antagonistic responses with a focus on race, ethnicity, or religion; and more general responses. Classification features were derived from the content of each tweet, including grammatical dependencies between words to recognize “othering” phrases, incitement to respond with antagonistic action, and claims of well‐founded or justified discrimination against social groups. The results of the classifier were optimal using a combination of probabilistic, rule‐based, and spatial‐based classifiers with a voted ensemble meta‐classifier. We demonstrate how the results of the classifier can be robustly utilized in a statistical model used to forecast the likely spread of cyber hate in a sample of Twitter data. The applications to policy and decision making are discussed.
This book is a profound exploration of truth commissions around the world, and the anguish, injustice, and the legacy of hate they are meant to absolve. Hayner examines twenty major … This book is a profound exploration of truth commissions around the world, and the anguish, injustice, and the legacy of hate they are meant to absolve. Hayner examines twenty major truth commissions established around the world paying special attention to South Africa, El Salvador, Argentina, Chile, and Guatemala.
As thousands of demonstrators took to the streets of Ferguson, Missouri, to protest the fatal police shooting of unarmed African American teenager Michael Brown in the summer of 2014, news … As thousands of demonstrators took to the streets of Ferguson, Missouri, to protest the fatal police shooting of unarmed African American teenager Michael Brown in the summer of 2014, news and commentary on the shooting, the protests, and the militarized response that followed circulated widely through social media networks. Through a theorization of hashtag usage, we discuss how and why social media platforms have become powerful sites for documenting and challenging episodes of police brutality and the misrepresentation of racialized bodies in mainstream media. We show how engaging in “hashtag activism” can forge a shared political temporality, and, additionally, we examine how social media platforms can provide strategic outlets for contesting and reimagining the materiality of racialized bodies. Our analysis combines approaches from linguistic anthropology and social movements research to investigate the semiotics of digital protest and to interrogate both the possibilities and the pitfalls of engaging in “hashtag ethnography.”
OBJECTIVE. We sought to identify the characteristics of youth who are targets of Internet harassment and characteristics related to reporting distress as a result of the incident. PARTICIPANTS AND METHODS. … OBJECTIVE. We sought to identify the characteristics of youth who are targets of Internet harassment and characteristics related to reporting distress as a result of the incident. PARTICIPANTS AND METHODS. The Second Youth Internet Safety Survey is a national telephone survey of a random sample of 1500 Internet users between the ages of 10 and 17 years conducted between March and June 2005. Participants had used the Internet at least once a month for the previous 6 months. RESULTS. Nine percent of the youth who used the Internet were targets of online harassment in the previous year. Thirty-two percent of the targets reported chronic harassment (ie, harassment ≥3 times in the previous year). In specific incidents, almost half (45%) knew the harasser in person before the incident. Half of the harassers (50%) were reportedly male, and half (51%) were adolescents. One in 4 targets reported an aggressive offline contact (eg, the harasser telephoned, came to the youth's home, or sent gifts); 2 in 3 disclosed the incident to another person. Among otherwise similar youth, the odds of being a target of Internet harassment were higher for those youth who harassed others online, reported borderline/clinically significant social problems, and were victimized in other contexts. Likewise, using the Internet for instant messaging, blogging, and chat room use each elevated the odds of being a target of Internet harassment versus those who did not engage in these online activities. All other demographic, Internet-use, and psychosocial characteristics were not related to reports of online harassment. Thirty-eight percent of the harassed youth reported distress as a result of the incident. Those who were targeted by adults, asked to send a picture of themselves, received an aggressive offline contact (eg, the harasser telephoned or came to the youth's home), and were preadolescents were each significantly more likely to report distress because of the experience. Conversely, the youth who visited chat rooms were significantly less likely to be distressed by the harassment. CONCLUSIONS. Internet harassment can be a serious event for some youth. Because there has been a significant increase in the prevalence of Internet harassment from 2000 to 2005, adolescent health professionals should continue to be vigilant about such experiences in the lives of young people with whom they interact. Social problems and online aggressive behavior are each associated with elevated odds of being the target of harassment. Thus, prevention efforts may be best aimed at improving the interpersonal skills of young people who choose to communicate with others using these online tools. Adolescent health professionals should be especially aware of events that include aggressive offline contacts by adult harassers or asking the child or adolescent to send a picture of themselves, because each of these scenarios increase the odds of reporting distress by more than threefold. Findings further support the call for the inclusion of Internet-harassment prevention in conventional antibullying programs empowering schools to address Internet bullying situations that occur between students. This will not solve all situations, however. We also must encourage Internet service providers to partner with consumers to be proactive in serious harassment episodes that violate criminal laws and service-provider codes of conduct.
(1998). Complaints About Transgressions and Misconduct. Research on Language and Social Interaction: Vol. 31, No. 3-4, pp. 295-325. (1998). Complaints About Transgressions and Misconduct. Research on Language and Social Interaction: Vol. 31, No. 3-4, pp. 295-325.
Within the broader framework of a research programme on the reproduction of racism in discourse and communication, the present article examines the prominent role of the denial of racism, especially … Within the broader framework of a research programme on the reproduction of racism in discourse and communication, the present article examines the prominent role of the denial of racism, especially among the elites, in much contemporary text and talk about ethnic relations. After a conceptual analysis of denial strategies in interpersonal impression formation on the one hand, and within the social-political context of minority and immigration management on the other, various types of denial are examined in everyday conversations, press reports and parliamentary debates. Among these forms of denial are disclaimers, mitigation, euphemism, excuses, blaming the victim, reversal and other moves of defence, face-keeping and positive self-presentation in negative discourse about minorities, immigrants and (other) anti-racists.
Practical and accessible, E-Moderating is a user's guide to working effectively in the virtual world, covering key areas including: * The why, what and how of e-moderating; * Becoming a … Practical and accessible, E-Moderating is a user's guide to working effectively in the virtual world, covering key areas including: * The why, what and how of e-moderating; * Becoming a good e-moderator; * The benefits to learners of e-moderating; * Training to become an effective e-moderator. It also features a unique collection of resources for practitioners. Fully updated and expanded, this second edition features new material on the latest research and practice in the field, fresh case studies and practitioners resources, and a brand new chapter on future e-learning scenarios. The book is also accompanied by a website www.e-moderating.com which provides supplementary material and links. E-moderating is an essential purchase for any teacher, instructor, tutor or facilitator working in an electronic environment, and will help to improve your understanding and practice of online teaching and learning. .
<h3>Abstract</h3> CS teargas is one of the most used tools for crowd-control worldwide. Exposure to CS teargas is known to have consequences on protesters’ health (i.e. eye, skin irritation, respiratory … <h3>Abstract</h3> CS teargas is one of the most used tools for crowd-control worldwide. Exposure to CS teargas is known to have consequences on protesters’ health (i.e. eye, skin irritation, respiratory problems), but recent concerns have been raised over its potential gender-specific effects. Indeed, field and clinical observations report cases of menstrual cycle issues among female protesters following high exposure to teargas. The hypothesis of a link between teargas exposure and menstrual cycle issues is plausible from a physiological standpoint, but has not yet been empirically investigated. Using data from a cross-sectional study on Yellow Vests protesters’ health in France, we examined the relationship between exposure to teargas and menstrual cycle issues among female protesters (<i>n</i> = 145). Analyses suggested a positive link between exposure and menstrual cycle perturbations. These results constitute first and preliminary evidence that CS teargas may be linked with menstrual cycle among women, which need corroboration given the importance of this issue. We call for further research on the potential effects of CS teargas on women’s reproductive system.
Extremists, such as hate groups espousing racial supremacy or separation, have established an online presence. A content analysis of 157 extremist web sites selected through purposive sampling was conducted using … Extremists, such as hate groups espousing racial supremacy or separation, have established an online presence. A content analysis of 157 extremist web sites selected through purposive sampling was conducted using two raters per site. The sample represented a variety of extremist groups and included both organized groups and sites maintained by apparently unaffiliated individuals. Among the findings were that the majority of sites contained external links to other extremist sites (including international sites), that roughly half the sites included multimedia content, and that half contained racist symbols. A third of the sites disavowed racism or hatred, yet one third contained material from supremacist literature. A small percentage of sites specifically urged violence. These and other findings suggest that the Internet may be an especially powerful tool for extremists as a means of reaching an international audience, recruiting members, linking diverse extremist groups, and allowing maximum image control.
Since the textual contents on online social media are highly unstructured, informal, and often misspelled, existing research on message-level offensive language detection cannot accurately detect offensive content. Meanwhile, user-level offensiveness … Since the textual contents on online social media are highly unstructured, informal, and often misspelled, existing research on message-level offensive language detection cannot accurately detect offensive content. Meanwhile, user-level offensiveness detection seems a more feasible approach but it is an under researched area. To bridge this gap, we propose the Lexical Syntactic Feature (LSF) architecture to detect offensive content and identify potential offensive users in social media. We distinguish the contribution of pejoratives/profanities and obscenities in determining offensive content, and introduce hand-authoring syntactic rules in identifying name-calling harassments. In particular, we incorporate a user's writing style, structure and specific cyber bullying content as features to predict the user's potentiality to send out offensive content. Results from experiments showed that our LSF framework performed significantly better than existing methods in offensive content detection. It achieves precision of 98.24% and recall of 94.34% in sentence offensive detection, as well as precision of 77.9% and recall of 77.8% in user offensive detection. Meanwhile, the processing speed of LSF is approximately 10msec per sentence, suggesting the potential for effective deployment in social media.
This volume gives a theoretical account of the problem of analyzing and evaluating argumentative discourse. After placing argumentation in a communicative perspective, and then discussing the fallacies that occur when … This volume gives a theoretical account of the problem of analyzing and evaluating argumentative discourse. After placing argumentation in a communicative perspective, and then discussing the fallacies that occur when certain rules of communication are violated, the authors offer an alternative to both the linguistically-inspired descriptive and logically-inspired normative approaches to argumentation. The authors characterize argumentation as a complex speech act in a critical discussion aimed at resolving a difference of opinion. The various stages of a critical discussion are outlined, and the communicative and interactional aspects of the speech acts performed in resolving a simple or complex dispute are discussed. After dealing with crucial aspects of analysis and linking the evaluation of argumentative discourse to the analysis, the authors identify the fallacies that can occur at various stages of discussion. Their general aim is to elucidate their own pragma- dialectical perspective on the analysis and evaluation of argumentative discourse, bringing together pragmatic insight concerning speech acts and dialectical insight concerning critical discussion.
The scourge of cyberbullying has assumed alarming proportions with an ever-increasing number of adolescents admitting to having dealt with it either as a victim or as a bystander. Anonymity and … The scourge of cyberbullying has assumed alarming proportions with an ever-increasing number of adolescents admitting to having dealt with it either as a victim or as a bystander. Anonymity and the lack of meaningful supervision in the electronic medium are two factors that have exacerbated this social menace. Comments or posts involving sensitive topics that are personal to an individual are more likely to be internalized by a victim, often resulting in tragic outcomes. We decompose the overall detection problem into detection of sensitive topics, lending itself into text classification sub-problems. We experiment with a corpus of 4500 YouTube comments, applying a range of binary and multiclass classifiers. We find that binary classifiers for individual labels outperform multiclass classifiers. Our findings show that the detection of textual cyberbullying can be tackled by building individual topic-sensitive classifiers.
Hate speech in the form of racism and sexism is commonplace on the internet (Waseem and Hovy, 2016).For this reason, there has been both an academic and an industry interest … Hate speech in the form of racism and sexism is commonplace on the internet (Waseem and Hovy, 2016).For this reason, there has been both an academic and an industry interest in detection of hate speech.The volume of data to be reviewed for creating data sets encourages a use of crowd sourcing for the annotation efforts.In this paper, we provide an examination of the influence of annotator knowledge of hate speech on classification models by comparing classification results obtained from training on expert and amateur annotations.We provide an evaluation on our own data set and run our models on the data set released by Waseem and Hovy (2016).We find that amateur annotators are more likely than expert annotators to label items as hate speech, and that systems trained on expert annotations outperform systems trained on amateur annotations.
A key challenge for automatic hate-speech detection on social media is the separation of hate speech from other instances of offensive language. Lexical detection methods tend to have low precision … A key challenge for automatic hate-speech detection on social media is the separation of hate speech from other instances of offensive language. Lexical detection methods tend to have low precision because they classify all messages containing particular terms as hate speech and previous work using supervised learning has failed to distinguish between the two categories. We used a crowd-sourced hate speech lexicon to collect tweets containing hate speech keywords. We use crowd-sourcing to label a sample of these tweets into three categories: those containing hate speech, only offensive language, and those with neither. We train a multi-class classifier to distinguish between these different categories. Close analysis of the predictions and the errors shows when we can reliably separate hate speech from other offensive language and when this differentiation is more difficult. We find that racist and homophobic tweets are more likely to be classified as hate speech but that sexist tweets are generally classified as offensive. Tweets without explicit hate keywords are also more difficult to classify.
This paper presents a survey on hate speech detection. Given the steadily growing body of social media content, the amount of online hate speech is also increasing. Due to the … This paper presents a survey on hate speech detection. Given the steadily growing body of social media content, the amount of online hate speech is also increasing. Due to the massive scale of the web, methods that automatically detect hate speech are required. Our survey describes key areas that have been explored to automatically recognize these types of utterances using natural language processing. We also discuss limits of those approaches.
The paper introduces a deep learning-based Twitter hate-speech text classification system. The classifier assigns each tweet to one of four predefined categories: racism, sexism, both (racism and sexism) and non-hate-speech. … The paper introduces a deep learning-based Twitter hate-speech text classification system. The classifier assigns each tweet to one of four predefined categories: racism, sexism, both (racism and sexism) and non-hate-speech. Four Convolutional Neural Network models were trained on resp. character 4-grams, word vectors based on semantic information built using word2vec, randomly generated word vectors, and word vectors combined with character n-grams. The feature set was down-sized in the networks by max-pooling, and a softmax function used to classify tweets. Tested by 10-fold cross-validation, the model based on word2vec embeddings performed best, with higher precision than recall, and a 78.3% F-score.
In recent years online social networks have suffered an increase in sexism, racism, and other types of aggressive and cyberbullying behavior, often manifesting itself through offensive, abusive, or hateful language. … In recent years online social networks have suffered an increase in sexism, racism, and other types of aggressive and cyberbullying behavior, often manifesting itself through offensive, abusive, or hateful language. Past scientific work focused on studying these forms of abusive activity in popular online social networks, such as Facebook and Twitter. Building on such work, we present an eight month study of the various forms of abusive behavior on Twitter, in a holistic fashion. Departing from past work, we examine a wide variety of labeling schemes, which cover different forms of abusive behavior. We propose an incremental and iterative methodology that leverages the power of crowdsourcing to annotate a large collection of tweets with a set of abuse-related labels. By applying our methodology and performing statistical analysis for label merging or elimination, we identify a reduced but robust set of labels to characterize abuse-related tweets. Finally, we offer a characterization of our annotated dataset of 80 thousand tweets, which we make publicly available for further scientific exploration.
We introduce and illustrate a new approach to measuring and mitigating unintended bias in machine learning models. Our definition of unintended bias is parameterized by a test set and a … We introduce and illustrate a new approach to measuring and mitigating unintended bias in machine learning models. Our definition of unintended bias is parameterized by a test set and a subset of input features. We illustrate how this can be used to evaluate text classifiers using a synthetic test set and a public corpus of comments annotated for toxicity from Wikipedia Talk pages. We also demonstrate how imbalances in training data can lead to unintended bias in the resulting models, and therefore potentially unfair applications. We use a set of common demographic identity terms as the subset of input features on which we measure bias. This technique permits analysis in the common scenario where demographic information on authors and readers is unavailable, so that bias mitigation must focus on the content of the text itself. The mitigation method we introduce is an unsupervised approach based on balancing the training dataset. We demonstrate that this approach reduces the unintended bias without compromising overall model quality.
With the rapid growth of social networks and microblogging websites, communication between people from different cultural and psychological backgrounds has become more direct, resulting in more and more "cyber"conflicts between … With the rapid growth of social networks and microblogging websites, communication between people from different cultural and psychological backgrounds has become more direct, resulting in more and more "cyber"conflicts between these people. Consequently, hate speech is used more and more, to the point where it has become a serious problem invading these open spaces. Hate speech refers to the use of aggressive, violent or offensive language, targeting a specific group of people sharing a common property, whether this property is their gender (i.e., sexism), their ethnic group or race (i.e., racism) or their believes and religion. While most of the online social networks and microblogging websites forbid the use of hate speech, the size of these networks and websites makes it almost impossible to control all of their content. Therefore, arises the necessity to detect such speech automatically and filter any content that presents hateful language or language inciting to hatred. In this paper, we propose an approach to detect hate expressions on Twitter. Our approach is based on unigrams and patterns that are automatically collected from the training set. These patterns and unigrams are later used, among others, as features to train a machine learning algorithm. Our experiments on a test set composed of 2010 tweets show that our approach reaches an accuracy equal to 87.4% on detecting whether a tweet is offensive or not (binary classification), and an accuracy equal to 78.4% on detecting whether a tweet is hateful, offensive, or clean (ternary classification).
The scientific study of hate speech, from a computer science point of view, is recent. This survey organizes and describes the current state of the field, providing a structured overview … The scientific study of hate speech, from a computer science point of view, is recent. This survey organizes and describes the current state of the field, providing a structured overview of previous approaches, including core algorithms, methods, and main features used. This work also discusses the complexity of the concept of hate speech, defined in many platforms and contexts, and provides a unifying definition. This area has an unquestionable potential for societal impact, particularly in online communities and digital media platforms. The development and systematization of shared resources, such as guidelines, annotated datasets in multiple languages, and algorithms, is a crucial step in advancing the automatic detection of hate speech.
We present the results and the main findings of SemEval-2019 Task 6 on Identifying and Categorizing Offensive Language in Social Media (OffensEval). The task was based on a new dataset, … We present the results and the main findings of SemEval-2019 Task 6 on Identifying and Categorizing Offensive Language in Social Media (OffensEval). The task was based on a new dataset, the Offensive Language Identification Dataset (OLID), which contains over 14,000 English tweets, and it featured three sub-tasks. In sub-task A, systems were asked to discriminate between offensive and non-offensive posts. In sub-task B, systems had to identify the type of offensive content in the post. Finally, in sub-task C, systems had to detect the target of the offensive posts. OffensEval attracted a large number of participants and it was one of the most popular tasks in SemEval-2019. In total, nearly 800 teams signed up to participate in the task and 115 of them submitted results, which are presented and analyzed in this report.
We investigate how annotators’ insensitivity to differences in dialect can lead to racial bias in automatic hate speech detection models, potentially amplifying harm against minority populations. We first uncover unexpected … We investigate how annotators’ insensitivity to differences in dialect can lead to racial bias in automatic hate speech detection models, potentially amplifying harm against minority populations. We first uncover unexpected correlations between surface markers of African American English (AAE) and ratings of toxicity in several widely-used hate speech datasets. Then, we show that models trained on these corpora acquire and propagate these biases, such that AAE tweets and tweets by self-identified African Americans are up to two times more likely to be labelled as offensive compared to others. Finally, we propose *dialect* and *race priming* as ways to reduce the racial bias in annotation, showing that when annotators are made explicitly aware of an AAE tweet’s dialect they are significantly less likely to label the tweet as offensive.
Valerio Basile, Cristina Bosco, Elisabetta Fersini, Debora Nozza, Viviana Patti, Francisco Manuel Rangel Pardo, Paolo Rosso, Manuela Sanguinetti. Proceedings of the 13th International Workshop on Semantic Evaluation. 2019. Valerio Basile, Cristina Bosco, Elisabetta Fersini, Debora Nozza, Viviana Patti, Francisco Manuel Rangel Pardo, Paolo Rosso, Manuela Sanguinetti. Proceedings of the 13th International Workshop on Semantic Evaluation. 2019.
Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language … Marcos Zampieri, Shervin Malmasi, Preslav Nakov, Sara Rosenthal, Noura Farra, Ritesh Kumar. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.
Chandler May, Alex Wang, Shikha Bordia, Samuel R. Bowman, Rachel Rudinger. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, … Chandler May, Alex Wang, Shikha Bordia, Samuel R. Bowman, Rachel Rudinger. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.
As the body of research on abusive language detection and analysis grows, there is a need for critical consideration of the relationships between different subtasks that have been grouped under … As the body of research on abusive language detection and analysis grows, there is a need for critical consideration of the relationships between different subtasks that have been grouped under this label. Based on work on hate speech, cyberbullying, and online abuse we propose a typology that captures central similarities and differences between subtasks and discuss the implications of this for data annotation and feature construction. We emphasize the practical actions that can be taken by researchers to best approach their abusive language detection subtask of interest.
As online content continues to grow, so does the spread of hate speech. We identify and examine challenges faced by online automatic approaches for hate speech detection in text. Among … As online content continues to grow, so does the spread of hate speech. We identify and examine challenges faced by online automatic approaches for hate speech detection in text. Among these difficulties are subtleties in language, differing definitions on what constitutes hate speech, and limitations of data availability for training and testing of these systems. Furthermore, many recent approaches suffer from an interpretability problem-that is, it can be difficult to understand why the systems make the decisions that they do. We propose a multi-view SVM approach that achieves near state-of-the-art performance, while being simpler and producing more easily interpretable decisions than neural methods. We also discuss both technical and practical challenges that remain for this task.
As government pressure on major technology companies builds, both firms and legislators are searching for technical solutions to difficult platform governance puzzles such as hate speech and misinformation. Automated hash-matching … As government pressure on major technology companies builds, both firms and legislators are searching for technical solutions to difficult platform governance puzzles such as hate speech and misinformation. Automated hash-matching and predictive machine learning tools – what we define here as algorithmic moderation systems – are increasingly being deployed to conduct content moderation at scale by major platforms for user-generated content such as Facebook, YouTube and Twitter. This article provides an accessible technical primer on how algorithmic moderation works; examines some of the existing automated tools used by major platforms to handle copyright infringement, terrorism and toxic speech; and identifies key political and ethical issues for these systems as the reliance on them grows. Recent events suggest that algorithmic moderation has become necessary to manage growing public expectations for increased platform responsibility, safety and security on the global stage; however, as we demonstrate, these systems remain opaque, unaccountable and poorly understood. Despite the potential promise of algorithms or 'AI', we show that even 'well optimized' moderation systems could exacerbate, rather than relieve, many existing problems with content policy as enacted by platforms for three main reasons: automated moderation threatens to (a) further increase opacity, making a famously non-transparent set of practices even more difficult to understand or audit, (b) further complicate outstanding issues of fairness and justice in large-scale sociotechnical systems and (c) re-obscure the fundamentally political nature of speech decisions being executed at scale.
We survey 146 papers analyzing “bias” in NLP systems, finding that their motivations are often vague, inconsistent, and lacking in normative reasoning, despite the fact that analyzing “bias” is an … We survey 146 papers analyzing “bias” in NLP systems, finding that their motivations are often vague, inconsistent, and lacking in normative reasoning, despite the fact that analyzing “bias” is an inherently normative process. We further find that these papers’ proposed quantitative techniques for measuring or mitigating “bias” are poorly matched to their motivations and do not engage with the relevant literature outside of NLP. Based on these findings, we describe the beginnings of a path forward by proposing three recommendations that should guide work analyzing “bias” in NLP systems. These recommendations rest on a greater recognition of the relationships between language and social hierarchies, encouraging researchers and practitioners to articulate their conceptualizations of “bias”---i.e., what kinds of system behaviors are harmful, in what ways, to whom, and why, as well as the normative reasoning underlying these statements—and to center work around the lived experiences of members of communities affected by NLP systems, while interrogating and reimagining the power relations between technologists and such communities.
The experimental landscape in natural language processing for social media is too fragmented. Each year, new shared tasks and datasets are proposed, ranging from classics like sentiment analysis to irony … The experimental landscape in natural language processing for social media is too fragmented. Each year, new shared tasks and datasets are proposed, ranging from classics like sentiment analysis to irony detection or emoji prediction. Therefore, it is unclear what the current state of the art is, as there is no standardized evaluation protocol, neither a strong set of baselines trained on such domain-specific data. In this paper, we propose a new evaluation framework (TweetEval) consisting of seven heterogeneous Twitter-specific classification tasks. We also provide a strong set of baselines as starting point, and compare different language modeling pre-training strategies. Our initial experiments show the effectiveness of starting off with existing pre-trained generic language models, and continue training them on Twitter corpora.
Hate speech detection on Twitter is critical for applications like controversial event extraction, building AI chatterbots, content recommendation, and sentiment analysis. We define this task as being able to classify … Hate speech detection on Twitter is critical for applications like controversial event extraction, building AI chatterbots, content recommendation, and sentiment analysis. We define this task as being able to classify a tweet as racist, sexist or neither. The complexity of the natural language constructs makes this task very challenging. We perform extensive experiments with multiple deep learning architectures to learn semantic word embeddings to handle this complexity. Our experiments on a benchmark dataset of 16K annotated tweets show that such deep learning methods outperform state-of-the-art char/word n-gram methods by ~18 F1 points.
Marcos Zampieri, Preslav Nakov, Sara Rosenthal, Pepa Atanasova, Georgi Karadzhov, Hamdy Mubarak, Leon Derczynski, Zeses Pitenis, Çağrı Çöltekin. Proceedings of the Fourteenth Workshop on Semantic Evaluation. 2020. Marcos Zampieri, Preslav Nakov, Sara Rosenthal, Pepa Atanasova, Georgi Karadzhov, Hamdy Mubarak, Leon Derczynski, Zeses Pitenis, Çağrı Çöltekin. Proceedings of the Fourteenth Workshop on Semantic Evaluation. 2020.
Every liberal democracy has laws or codes against hate speech&mdash;except the United States. For constitutionalists, regulation of hate speech violates the First Amendment and damages a free society. Against this … Every liberal democracy has laws or codes against hate speech&mdash;except the United States. For constitutionalists, regulation of hate speech violates the First Amendment and damages a free society. Against this absolutist view, Jeremy Waldron argues powerfully that hate speech should be regulated as part of our commitment to human dignity and to inclusion and respect for members of vulnerable minorities. Causing offense&mdash;by depicting a religious leader as a terrorist in a newspaper cartoon, for example&mdash;is not the same as launching a libelous attack on a group&rsquo;s dignity, according to Waldron, and it lies outside the reach of law. But defamation of a minority group, through hate speech, undermines a public good that can and should be protected: the basic assurance of inclusion in society for all members. A social environment polluted by anti-gay leaflets, Nazi banners, and burning crosses sends an implicit message to the targets of such hatred: your security is uncertain and you can expect to face humiliation and discrimination when you leave your home. Free-speech advocates boast of despising what racists say but defending to the death their right to say it. Waldron finds this emphasis on intellectual resilience misguided and points instead to the threat hate speech poses to the lives, dignity, and reputations of minority members. Finding support for his view among philosophers of the Enlightenment, Waldron asks us to move beyond knee-jerk American exceptionalism in our debates over the serious consequences of hateful speech.
Some see the Internet as a Wild West where those who venture online must be thick-skinned enough to endure verbal attacks in the name of free speech protection. Danielle Keats … Some see the Internet as a Wild West where those who venture online must be thick-skinned enough to endure verbal attacks in the name of free speech protection. Danielle Keats Citron rejects this view. Cyber-harassment is a matter of civil rights law, and legal precedents as well as social norms of decency and civility must be leveraged to stop it.
A black family enters a coffee shop in a small Texas town.A white man places a card on their table.The card reads, "You have just been paid a visit by … A black family enters a coffee shop in a small Texas town.A white man places a card on their table.The card reads, "You have just been paid a visit by the Ku Klux Klan."The family stands and leaves.I A law student goes to her dorm and finds an anonymous message posted on the door, a caricature image of her race, with a red line slashed through it.
Although the social medium Twitter grants users freedom of speech, its instantaneous nature and retweeting features also amplify hate speech. Because Twitter has a sizeable black constituency, racist tweets against … Although the social medium Twitter grants users freedom of speech, its instantaneous nature and retweeting features also amplify hate speech. Because Twitter has a sizeable black constituency, racist tweets against blacks are especially detrimental in the Twitter community, though this effect may not be obvious against a backdrop of half a billion tweets a day.1 We apply a supervised machine learning approach, employing inexpensively acquired labeled data from diverse Twitter accounts to learn a binary classifier for the labels “racist” and “nonracist.” The classifier has a 76% average accuracy on individual tweets, suggesting that with further improvements, our work can contribute data on the sources of anti-black hate speech.
Content moderation plays a pivotal role in structuring online speech, but the human labour and the everyday decision-making process in content moderation remain underexamined. Informed by in-depth interviews with 16 … Content moderation plays a pivotal role in structuring online speech, but the human labour and the everyday decision-making process in content moderation remain underexamined. Informed by in-depth interviews with 16 content moderators in India, in this research, we analyse the decision-making process of commercial content moderators through the concept of sensemaking. We argue that moderation decisions are made in the context of the industry’s plural policies and efficiency requirements. An interplay of four cognitive processes of pattern identification, subjective perceptions, shared knowledge, and process optimization influences the final judgement. Once sense is enacted in the decision-making process, the sensibilities are retained by the adept moderator for future moderation decisions. Visibilizing the labour process behind commercial content moderation, we argue that everyday moderation decisions unfold in a socio-technical and economic assemblage wherein decisions are decontextualised and plausibility driven rather than consistency driven.
Astrid Fly Oredsson , Kasper Lippert‐Rasmussen | Cambridge University Press eBooks
<title>Abstract</title> Detecting and mitigating sexist language has become a critical issue in digital communication. While human experts can identify nuanced forms of sexism, the growing volume of online content makes … <title>Abstract</title> Detecting and mitigating sexist language has become a critical issue in digital communication. While human experts can identify nuanced forms of sexism, the growing volume of online content makes manual detection impractical. This study compares four machine learning approaches for automated sexism detection: trigram frequency models, text vectorization techniques, convolutional neural networks (CNN), and RoBERTa, a transformer-based model. Traditional methods like trigram analysis and text vectorization are useful for identifying basic patterns but struggle to capture the contextual and semantic nuances inherent in sexist language. In contrast, more advanced models, such as CNNs and RoBERTa, leverage deeper understanding of language structure and context. Using a publicly available dataset, we evaluate the performance of these models based on accuracy, precision, recall, and F1-score. Our findings reveal that while trigram analysis and text vectorization provide some insights, RoBERTa consistently outperforms the other models by capturing the subtleties of sexist language and providing more accurate and reliable results. This research not only improves the technical methodologies for sexism detection but also contributes to the development of scalable, automated moderation tools that can address harmful linguistic patterns in real-time, promoting safer and more inclusive online environments.
The proliferation of harmful and toxic comments on social media platforms necessitates the development of robust methods for automatically detecting and classifying such content. This paper investigates the application of … The proliferation of harmful and toxic comments on social media platforms necessitates the development of robust methods for automatically detecting and classifying such content. This paper investigates the application of natural language processing (NLP) and ML techniques for toxic comment classification using the Jigsaw Toxic Comment Dataset. Several deep learning models, including recurrent neural networks (RNN, LSTM, and GRU), are evaluated in combination with feature extraction methods such as TF-IDF, Word2Vec, and BERT embeddings. The text data is pre-processed using both Word2Vec and TF-IDF techniques for feature extraction. Rather than implementing a combined ensemble output, the study conducts a comparative evaluation of model-embedding combinations to determine the most effective pairings. Results indicate that integrating BERT with traditional models (RNN+BERT, LSTM+BERT, GRU+BERT) leads to significant improvements in classification accuracy, precision, recall, and F1-score, demonstrating the effectiveness of BERT embeddings in capturing nuanced text features. Among all configurations, LSTM combined with Word2Vec and LSTM with BERT yielded the highest performance. This comparative approach highlights the potential of combining classical recurrent models with transformer-based embeddings as a promising direction for detecting toxic comments. The findings of this work provide valuable insights into leveraging deep learning techniques for toxic comment detection, suggesting future directions for refining such models in real-world applications.
To address the limitations of self-regulation and the need to combat online misinformation, domestic policies are increasingly imposing platform liability for content moderation. This study employs qualitative comparative analysis to … To address the limitations of self-regulation and the need to combat online misinformation, domestic policies are increasingly imposing platform liability for content moderation. This study employs qualitative comparative analysis to examine five key national legislations: Germany's NetzDG, France's Law No. 2018-1202, Brazil's Resolution No. 23.732/2024, Singapore's POFMA, and Turkey's Law No. 2022-7418. Guided by the UN's and UNESCO's human rights-based recommendations for platform governance, the analysis focuses on five dimensions: definitions of content and misinformation, moderation practices, transparency requirements, penalties, and independent oversight. Our findings reveal variations on how misinformation is defined, with most jurisdictions adopting vague formulations. Only Brazil's resolution explicitly addresses AI-generated content. NetzDG emphasises platform-led enforcement; French and Brazilian jurisdictions rely more on judicial orders; POFMA and Turkey's law grant discretionary powers to state authorities. Independent oversight – a key safeguard for human rights – is formalised only in France (Arcom) and Germany (regulated self-regulation). Although Turkey designates the BTK as an oversight body, its independence is widely contested. Without an independent regulator, Brazil's resolution allows judicial assessments to draw on verifications conducted by accredited fact-checkers.
Abstract Collective identity is not a static construct; it grows and changes as the world changes around it. This article approaches identity as a discursive construct that takes shape through … Abstract Collective identity is not a static construct; it grows and changes as the world changes around it. This article approaches identity as a discursive construct that takes shape through words and concepts with shared meanings among a community of knowers. Specifically, the analysis focuses on white nationalist discourse on Twitter to expose the emergence of a far-right identity politics, which appropriates the rhetoric of left politics, which traditionally advocates for marginalized groups. By constructing narratives around ethnocrisis to reflect concurrent efforts by liberals to de-center whiteness in politics and culture more broadly, white nationalists have leveraged digital publics to position white people, particularly white men, as an oppressed class, dispossessed of social and cultural agency. To examine and illustrate its conceptual and discursive characteristics, a digital ethnography of white nationalist discourse is conducted by combining computational and qualitative methods on a dataset comprising 146,210 Twitter users and 211 million utterances from 2014 to 2017. Using a frequency-based method of lexicon extraction, a large and comprehensive set of terms associated with white nationalism is generated and manually classified into thematic categories representing the conceptual space of white nationalist identity. This “imaginative geography” highlights new developments in the political movement of white supremacy as well as enduring themes from its earlier manifestations. By integrating quantitative and qualitative analytical paradigms, this article underscores the utility of social media as a particularly accessible site for discourse analysis and identity construction.
Alessandra Vitullo | Oxford University Press eBooks
Abstract The intersection of multiple social inequalities and narratives of discrimination has become a popular topic in various academic fields, including racism analysis, feminist analysis, migration studies, and postcolonial studies. … Abstract The intersection of multiple social inequalities and narratives of discrimination has become a popular topic in various academic fields, including racism analysis, feminist analysis, migration studies, and postcolonial studies. Relying on this theoretical framework, this article aims to show the narratives that characterize the online anti-Muslim discourses in Italy. By analyzing 31 interviews conducted with the volunteers of Amnesty International’s Hate Speech Task Force, the research highlights how Muslim religious identity intersects with other categories that have been historically marginalized, particularly in relation to gender and religion or legal and economic status. Basically, being Muslim seems to be an aggravating circumstance in hatred communication. This specific case study demonstrates how hate speech online against Islam in Italy is situated at the intersection of preexisting conditions of disadvantage and sedimented stereotypes that create a narrative of Islam perceived as an external threat that affects different levels of reality and activates a mechanism of preservation by the “threatened group” to create its common identity in opposition to the “otherness” of the Muslim culture.
Abstract To sow fear, anger, and violence, online hate propaganda often targets explicitly minority and marginalized communities, including Sikhs and Muslims in India. Social media has made it easier than … Abstract To sow fear, anger, and violence, online hate propaganda often targets explicitly minority and marginalized communities, including Sikhs and Muslims in India. Social media has made it easier than ever to spread hateful messages, which may have devastating effects on those who are the targets of such rhetoric. This research assesses online hate speech directed at Sikhs and Muslims in India and its effects on the individual’s identity, security, and sociability. It analyzes hate propaganda content on Facebook, Twitter, YouTube, and WhatsApp to highlight the overarching motifs and narratives that render these people as dangerous. Also, personally conducted interviews gauge the participation of the minority groups in creating and disseminating such material during turbulent times. This research aims to contribute toward understanding the phenomenon of digital hate speech and its repercussions toward marginalized communities. The study also investigates how Sikhs and Muslims in India cope with the difficulties and threats posed by online hate messaging. The study specifically highlights the use of derogatory terminology like “Khalistani Aatankvadi” and “Dahshatgard” against Sikhs and Muslims, respectively, in India in the context of online hate speech. By studying the usage patterns within the Indian context, this research fills a fundamental gap in our understanding of the relationship between social media, hate speech, and marginalized communities in India. In addition, it provides information and advice to policymakers, civil society actors, and media professionals in India to help them combat online hate propaganda and foster interfaith understanding within the Indian context.
Social media platforms have revolutionized communication, offering users a vast variety of opportunities to connect and share ideas. However, this freedom has also led to a rise in cyberbullying, which … Social media platforms have revolutionized communication, offering users a vast variety of opportunities to connect and share ideas. However, this freedom has also led to a rise in cyberbullying, which significantly impacts mental health and well-being. Cyberbullying often involves complex language, sarcasm, slang and subtle threats, making it difficult for automated systems to accurately identify. This research presents a supervised predictive analytic method for detecting cyber bullying on social media using Logistic Regression. The primary objective is to design an efficient system that can identify and classify cyber bullying incidents early; helping to prevent their escalation. Logistic Regression was employed as the core algorithm to predict the presence of cyber bullying. The model demonstrated proving the reliability of Logistic Regression in text classification tasks. Furthermore, additional analysis was performed to assess how various feature engineering techniques influence model performance. The research emphasizes the significance of incorporating diverse linguistic and contextual cues to enhance the accuracy of cyber bullying detection. In conclusion this project contributes to the proactive identification of cyber bullying by offering a scalable and effective solution using Logistic Regression, thus supporting safer online interactions on social media platforms
Studies of high-risk digital media use often treat youth in the United States as a monolith. Here, we present results from an online survey study of 489 U.S. youth (aged … Studies of high-risk digital media use often treat youth in the United States as a monolith. Here, we present results from an online survey study of 489 U.S. youth (aged 13-23) assessing relationships between drug-related online content exposure and drug use based on racial/ethnic identity. Regression models identified racial/ethnic differences in drug-related content exposure and interaction terms examined whether relationships of interest changed based on racial/ethnic identity. Racial/ethnic (RE)-minority youth had significantly higher odds of frequent exposure to drug-related content online and significant correlations between drug use and content exposure were seen among select populations of RE-minority youth. Assessments of drug-related digital media habits should be considered instrumental to understanding rising rates of drug use within U.S. RE-minority populations.
Objectives: This study examines the stance of both the Holy Qur’an and legal legislations on hate speech. It explores Qur’anic verses that reject hate speech and violence while promoting tolerance … Objectives: This study examines the stance of both the Holy Qur’an and legal legislations on hate speech. It explores Qur’anic verses that reject hate speech and violence while promoting tolerance and peaceful coexistence. The study also analyzes legal provisions addressing hate speech. Methods: The research employs an inductive approach to identify relevant Qur’anic verses and legal provisions, followed by an analytical approach that examines interpretations of these verses and evaluates legal frameworks combating hate speech. Results: The study finds that hate speech is an unethical practice condemned by the Holy Qur’an, which establishes principles for peaceful human interaction regardless of differences in race, language, or appearance. While legal systems address hate speech, they still require stricter regulations to criminalize such rhetoric effectively. Conclusions: Societies must actively combat hate speech through all available means. Strengthening legal measures and promoting the ethical guidelines set by the Qur’an can contribute to reducing hate speech and fostering social harmony.
The recent increase in extremist material on social media platforms makes serious countermeasures to international cybersecurity and national security efforts more difficult. RADAR#, a deep ensemble approach for the detection … The recent increase in extremist material on social media platforms makes serious countermeasures to international cybersecurity and national security efforts more difficult. RADAR#, a deep ensemble approach for the detection of radicalization in Arabic tweets, is introduced in this paper. Our model combines a hybrid CNN-Bi-LSTM framework with a top Arabic transformer model (AraBERT) through a weighted ensemble strategy. We employ domain-specific Arabic tweet pre-processing techniques and a custom attention layer to better focus on radicalization indicators. Experiments over a 89,816 Arabic tweet dataset indicate that RADAR# reaches 98% accuracy and a 97% F1-score, surpassing advanced approaches. The ensemble strategy is particularly beneficial in handling dialectical variations and context-sensitive words common in Arabic social media updates. We provide a full performance analysis of the model, including ablation studies and attention visualization for better interpretability. Our contribution is useful to the cybersecurity community through an effective early detection mechanism of online radicalization in Arabic language content, which can be potentially applied in counter-terrorism and online content moderation.
This chapter explores the evolving role of media law pedagogy in equipping future media professionals and citizens with the necessary tools to navigate this complex information ecosystem. It argues that … This chapter explores the evolving role of media law pedagogy in equipping future media professionals and citizens with the necessary tools to navigate this complex information ecosystem. It argues that traditional approaches to teaching media law, often focused on established legal frameworks, must be adapted to address the novel legal and ethical dilemmas presented by disinformation and hybrid tactics, through application of the fundamental principles on freedom of expression in different contexts and understanding of the strategic environment. By adopting comprehensive approach, media law education can empower future professionals and citizens to become informed and responsible, resilient actors in the fight against disinformation and the protection of democratic values. This chapter concludes by suggesting concrete strategies for curriculum development, pedagogical innovation, and interdisciplinary collaboration to effectively address these pressing challenges.
Lo scopo di questo lavoro è analizzare la struttura sintattica e l’uso di una particolare tipologia di enunciato nominale (EN) esclamativo nel contesto dell’hate speech. Per farlo, si utilizzerà il … Lo scopo di questo lavoro è analizzare la struttura sintattica e l’uso di una particolare tipologia di enunciato nominale (EN) esclamativo nel contesto dell’hate speech. Per farlo, si utilizzerà il framework teorico degli approcci non-sentenzialiste e, in particolar modo, ci si rifarà all’assunto della Generalizzazione Xmax, adattata al Programma Minimalista, secondo cui il nodo iniziale di una produzione linguistica non è di default la frase (IP), bensì la proiezione massimale che può effettivamente essere inferita dai dati empirici e attuata con un approccio bottom-up. Grazie alla flessibilità della Generalizzazione Xmax, che permette di teorizzare una classe di EN con un sintagma di Focus (FocP) come nodo iniziale, si analizzeranno gli EN esclamativi con il predicato in posizione preverbale (interessante, questo libro!) nel contesto dell’hate speech, il quale è stato scarsamente indagato sul fronte della sintassi. Grazie ai dati empirici del corpus HaSpEN, si vedranno le due casistiche principali di EN esclamativi, caratterizzati dall’elemento in posizione preverbale: a) quelli con un PP (fuori i migranti dall'Italia) o AdvP focalizzato (via gli stranieri dall’Italia); b) quelli con un NP, un DP o un AP focalizzato (tutti uguali i musulmani).
Umida QODIROVA | Markaziy osiyoda media va kommunikatsiyalar xalqaro ilmiy jurnali.
Bugungi raqamli davrda internetda soxta va phishing web-saytlar sonining ko‘payishi foydalanuvchilar uchun real tahdidga aylanmoqda. Saytlarning IP manzili va domeni haqidagi ma’lumotlarni tekshirish, ularning ishonchliligini baholashda WHOIS, IPinfo va VirusTotal … Bugungi raqamli davrda internetda soxta va phishing web-saytlar sonining ko‘payishi foydalanuvchilar uchun real tahdidga aylanmoqda. Saytlarning IP manzili va domeni haqidagi ma’lumotlarni tekshirish, ularning ishonchliligini baholashda WHOIS, IPinfo va VirusTotal kabi vositalarning ahamiyati yuksak. Shuningdek, sun’iy intellekt asosida ishlovchi tizimlar qisqa vaqt ichida katta hajmdagi ma’lumotni tahlil qilib, xavfsizlik tahdidlarini aniqlashi mumkin. Statistik ma’lumotlarga ko‘ra, phishing qurbonlari soni yillar davomida kamaymoqda, bu esa himoya vositalarining keng qo‘llanilayotganini ko‘rsatadi. Shunga qaramay, media gigiyenasiga rioya qilish va ishonchli vositalardan foydalanish hali ham dolzarbdir.
Transphobic hate speech remains underexamined as an example of privacy abuse, and namely, a form of doxxing. Doxxing, the non-consensual disclosure of personal, identifying, and sensitive information, converges with transphobic … Transphobic hate speech remains underexamined as an example of privacy abuse, and namely, a form of doxxing. Doxxing, the non-consensual disclosure of personal, identifying, and sensitive information, converges with transphobic hate speech to leverage sensitive information about gender identity into harm in the workplaces, social lives, and online security of trans people. The capacity of doxxing to perpetrate online hate speech is a pivotal concern amidst a climate of relaxed censorship and platform governance on social media sites like X. Current scholarship into online hate speech has rightfully acknowledged the rampant digital forms of misogyny endured by women and girls, but there is further scope to consider the specific experiences of trans women and gender-diverse people, as well as trans men. Drawing on a dataset of 274 tweets scraped from X, I analyse discourses about transphobic hate speech and doxxing, revealing how misgendering, intimidation, and outing harm trans people online. These findings have implications for understanding not only how transphobic hate speech is performed and conveyed on platforms like X, but how personal, identifying and sensitive information is mobilised to destroy the wellbeing and security of trans women and gender-diverse people through hateful speech acts. I sketch out the affirmative potential of informational autonomy , a resistive and relational framework of ethics which centres on the expansive possibilities of ‘hammering back’ against platforms which permit hate speech and data intrusions.
Subodh Sawale , Ashwini Garkhedkar | International Journal For Multidisciplinary Research
With the sizeable adoption of social media structures which includes Twitter, the dissemination of hateful content material targeting people or agencies primarily based on race, gender, faith, or ethnicity has … With the sizeable adoption of social media structures which includes Twitter, the dissemination of hateful content material targeting people or agencies primarily based on race, gender, faith, or ethnicity has become increasingly commonplace. Manual moderation techniques are not scalable due to the sizable and swiftly developing extent of consumer-generated content material. This have a look at proposes a system studying-based framework to robotically stumble on and classify hate speech on Twitter. The pipeline entails comprehensive textual content preprocessing— normalization, tokenization, stopword elimination, and lemmatization— accompanied by using TF-IDF-primarily based characteristic extraction. Four type models—Logistic Regression, Support Vector Machine (SVM), Naive Bayes, and Random Forest—are evaluated the usage of a publicly to be had categorised Twitter dataset. Results suggest that SVM and Random Forest provide superior overall performance in terms of precision, don't forget, and basic accuracy. This paintings highlights the effectiveness of computerized methods in moderating dangerous on-line content and lays the foundation for destiny upgrades along with multilingual support and real-time detection.
With the proliferation of social media, cyberbullying has emerged as a pervasive threat, causing significant psychological harm to individuals and undermining social cohesion. Its linguistic expressions vary widely across topics, … With the proliferation of social media, cyberbullying has emerged as a pervasive threat, causing significant psychological harm to individuals and undermining social cohesion. Its linguistic expressions vary widely across topics, complicating automatic detection efforts. Most existing methods struggle to generalize across diverse online contexts due to their reliance on topic-specific features. To address this issue, we propose the Topic Adversarial Neural Network (TANN), a novel end-to-end framework for topic-invariant cyberbullying detection. TANN integrates a multi-level feature extractor with a topic discriminator and a cyberbullying detector. It leverages adversarial training to disentangle topic-related information while retaining universal linguistic cues relevant to harmful content. We construct a multi-topic dataset from major Chinese social media platforms, such as Weibo and Tieba, to evaluate the generalization performance of TANN in real-world scenarios. Experimental results demonstrate that TANN outperforms existing methods in cross-topic detection tasks, significantly improving robustness and accuracy. This work advances cross-topic cyberbullying detection by introducing a scalable solution that mitigates topic interference and enables reliable performance across dynamic online environments.
This paper critically examines the risks to democratic institutions and practices posed by disinformation, echo chambers, and filter bubbles within contemporary social media environments. Adopting a modern republican approach and … This paper critically examines the risks to democratic institutions and practices posed by disinformation, echo chambers, and filter bubbles within contemporary social media environments. Adopting a modern republican approach and its conception of liberty as nondomination, this paper analyzes the role of algorithms, which curate and shape user experiences, in facilitating these challenges. My argument is that the proliferation of disinformation, echo chambers, and filter bubbles constitutes forms of domination that manipulate vulnerable social media users and imperil democratic ideals and institutions. To counter these risks, I argue for a three-pronged response that cultivates robust institutional and individual forms of antipower by regulating platforms to help protect users from arbitrary interference and empower them to fight back against domination.
¿Dónde acaba la broma o la descortesía y empieza el delito? El advenimiento de la digitalización y la globalización ha dotado de poder y espacios de expresión a cada voz … ¿Dónde acaba la broma o la descortesía y empieza el delito? El advenimiento de la digitalización y la globalización ha dotado de poder y espacios de expresión a cada voz individual y grupal. Sin embargo, la comunicación masiva muestra grietas y desajustes en el uso de esa libertad, que es un derecho: a menudo se utiliza la palabra sin ser consciente de sus implicaciones legales. Si bien es un avance que podamos todos expresarnos libremente en cualquier foro o red, lo cierto es que no se conoce suficientemente el efecto de una intervención inadecuada, sobre todo entre las generaciones más jóvenes. Por ejemplo, ante amenazas, delitos de odio (discurso del odio), falsas acusaciones, insultos, maledicencia, calumnia, falsedad, ofensa, injuria, difamación, ¿conocemos su alcance social (Fairclough, 1995) y sus penas correspondientes? El objetivo de este escrito es revisar la actualidad de la relación existente entre las leyes y los usos que proliferan en los medios digitales. Tras recabar datos estadísticos y estudiar casos, se deduce que resulta urgente informar de los riesgos y de cuáles son los delitos tipificados, enseñar y promover las buenas prácticas y alternativas éticas, además de fomentar cauces para conservar una libre expresión ejercida como derecho consciente y exenta de riesgos delictivos. En este sentido, los estudios de conciencia lingüística (Van Lier, 1996) y sensibilidad lingüística (Silvestre y Pardo, 2024) acuden a rellenar este nicho.
This study investigates the dynamics of hate speech, looking at feedback on comments and subsequent commenting. We examine the relationship between feedback on comments, hate speech presence, and commenter types, … This study investigates the dynamics of hate speech, looking at feedback on comments and subsequent commenting. We examine the relationship between feedback on comments, hate speech presence, and commenter types, with analysis of news comments during the 2022 South Korean presidential election campaigns. The data include 25 million comments, analyzed with a deep learning hate speech detection model. It was found that positive feedback encourages more commenting for non-hateful content, and negative feedback reduces subsequent non-hateful comments. However, surprisingly, negative feedback was found to rather increase the frequency of hateful comments, particularly among light commenters. Implications of the findings are discussed.
In online spaces, children are vulnerable to exploitation and sexual predators. Groomers contact minors in online chat rooms with the intent of sexual abuse. This study investigates how new deep … In online spaces, children are vulnerable to exploitation and sexual predators. Groomers contact minors in online chat rooms with the intent of sexual abuse. This study investigates how new deep learning models compare to traditional machine learning models in detecting grooming conversations and predatory authors. Furthermore, we detect the underlying tones used by predators and explore how these affect detection capabilities. Our goal is to better understand predator tactics and to advance automatic grooming detection in order to protect children in online spaces. The PAN12 chat logs, which contain grooming chat conversations, were used as the dataset for the research. These chat conversations were sorted into sentiments through the DistilBERT classifier based on the predator tone. SVMs and the LLaMA 3.2 1B large language model by Meta were then trained and fine-tuned on the different sentiments. The results measured through precision, recall and <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM1"><mml:msub><mml:mrow><mml:mi mathvariant="normal">F</mml:mi></mml:mrow><mml:mn>1</mml:mn></mml:msub></mml:math> score show that the large language model performs better in grooming detection than traditional machine learning. Moreover, performance differences between the positive and negative sentiment are captured and indicate that positive tone improves detection while negative toned grooming conversations have nuanced patterns that are harder to distinguish from non-grooming. This shows that groomers employ varying strategies to gain access to their victims. Lastly, with an <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM2"><mml:msub><mml:mrow><mml:mi mathvariant="normal">F</mml:mi></mml:mrow><mml:mn>1</mml:mn></mml:msub></mml:math> score of 0.99 and an <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" id="IM3"><mml:msub><mml:mrow><mml:mi mathvariant="normal">F</mml:mi></mml:mrow><mml:mrow><mml:mn>0.5</mml:mn></mml:mrow></mml:msub></mml:math> score of 0.99, the LLaMA 3.2 1B model outperforms both traditional machine learning, as well as previous versions of the large language model in grooming author detection.
Pandu Aji Prasojo , Muhammad Rizki | Journal of Criminal Justice Education
The rise in social media has improved communication but also amplified the spread of hate speech, creating serious societal risks. Automated detection remains difficult due to subjectivity, linguistic diversity, and … The rise in social media has improved communication but also amplified the spread of hate speech, creating serious societal risks. Automated detection remains difficult due to subjectivity, linguistic diversity, and implicit language. While prior research focuses on high-resource languages, this study addresses the underexplored multilingual challenges of Arabic and Urdu hate speech through a comprehensive approach. To achieve this objective, this study makes four different key contributions. First, we have created a unique multi-lingual, manually annotated binary and multi-class dataset (UA-HSD-2025) sourced from X, which contains the five most important multi-class categories of hate speech. Secondly, we created detailed annotation guidelines to make a robust and perfect hate speech dataset. Third, we explore two strategies to address the challenges of multilingual data: a joint multilingual and translation-based approach. The translation-based approach involves converting all input text into a single target language before applying a classifier. In contrast, the joint multilingual approach employs a unified model trained to handle multiple languages simultaneously, enabling it to classify text across different languages without translation. Finally, we have employed state-of-the-art 54 different experiments using different machine learning using TF-IDF, deep learning using advanced pre-trained word embeddings such as FastText and Glove, and pre-trained language-based models using advanced contextual embeddings. Based on the analysis of the results, our language-based model (XLM-R) outperformed traditional supervised learning approaches, achieving 0.99 accuracy in binary classification for Arabic, Urdu, and joint-multilingual datasets, and 0.95, 0.94, and 0.94 accuracy in multi-class classification for joint-multilingual, Arabic, and Urdu datasets, respectively.
This paper offers a chronological review of the path free speech had from ancient societies till the contemporary recession. New media technologies created more access to gathering and disseminating information … This paper offers a chronological review of the path free speech had from ancient societies till the contemporary recession. New media technologies created more access to gathering and disseminating information consequently inducing social changes from Guthenberg’s printing press till the Internet era. After the tragedy of WWII, the UN’s institutions defined freedom of speech and expression as individual freedom to articulate and express opinions and ideas without fear of censorship, retaliation, or legal sanction including the content and the means of expression. It is protected by law but is not absolute. Limitations are related to hate speech, libel, slander, etc. The Normative Theories provide a synthesis of ideas that express even conflicting views and are a reliable foundation for understanding the development of free speech and the change it induces in society, media, and culture. Libertarian vs. Authoritarian ideas about free speech raised a debate and provided a compromise between radical freedom of speech and government control of media to prevent possible harm in the form of social responsibility theory.
Binita Mukesh Shah | European Journal of Artificial Intelligence and Machine Learning
This paper examines the weaknesses of machine learning models to adversarial attacks in online abuse detection. With the growth of user- generated content online, platforms rely on automated systems to … This paper examines the weaknesses of machine learning models to adversarial attacks in online abuse detection. With the growth of user- generated content online, platforms rely on automated systems to detect and filter harmful content at scale. However, these systems remain vulnerable to manipulations by bad actors designed to circumvent detection. We investigate two prominent attack strategies TextFooler and HotFlip against transformer-based models trained on the Jigsaw Toxic Comment Classification dataset. Our experiments reveal considerable degradation in model performance under attack conditions, with accuracy drops of approximately 20%. This paper provides a detailed analysis of these attack strategies, implementation methods, and their impact on model reliability. The findings highlight critical vulnerabilities in current abuse detection systems and demonstrate the need for more robust approaches to maintain platform safety and integrity.
Rhetorical figures play a major role in everyday communication, making text and speech more interesting, memorable, or persuasive through their association between form and meaning. Computational detection of rhetorical figures … Rhetorical figures play a major role in everyday communication, making text and speech more interesting, memorable, or persuasive through their association between form and meaning. Computational detection of rhetorical figures plays an important part in thorough understanding of complex communication patterns. In this survey, we provide a comprehensive overview of computational approaches to lesser-known rhetorical figures. We explore the linguistic and computational perspectives on rhetorical figures and highlight their significance in the field of Natural Language Processing. We present different figures in detail and investigate datasets, definitions, rhetorical functions, and detection approaches. We identify challenges such as dataset scarcity, language limitations, and reliance on rule-based methods.
This study presents an advanced AI-powered framework to detect and prevent cyberbullying across diverse social media platforms using a multiclass classification approach. Addressing the growing complexity and linguistic diversity of … This study presents an advanced AI-powered framework to detect and prevent cyberbullying across diverse social media platforms using a multiclass classification approach. Addressing the growing complexity and linguistic diversity of online abuse, the research integrates various machine learning (RF, LR, SVM) and deep learning (Bi-LSTM, BERT) models trained on a balanced dataset covering bullying categories based on religion, age, ethnicity, gender, and neutral content. Data preprocessing, tokenization, feature extraction via TF-IDF and CountVectorizer, and class balancing using SMOTE were applied to enhance model accuracy. The proposed system further supports real-time detection through social media APIs, offering dynamic monitoring and intervention capabilities. Among the tested models, Random Forest and BERT achieved the highest classification performance with 94% accuracy. Despite its robust architecture, limitations include dependence on English-language datasets, exclusion of multimodal data (e.g., memes, audio), and API restrictions that challenge scalability. Future development will focus on incorporating vision-language models and optimizing the system for real-time, multilingual, and multimodal environments. This study contributes to digital safety efforts by proposing a scalable and adaptive detection system suitable for safeguarding users from evolving forms of online harassment.
While symbolic elites have traditionally monopolized victimhood discourse through discursive power, social media has disrupted these dynamics by enabling ordinary individuals to deploy such discourse, creating a landscape where victimhood … While symbolic elites have traditionally monopolized victimhood discourse through discursive power, social media has disrupted these dynamics by enabling ordinary individuals to deploy such discourse, creating a landscape where victimhood emerges as a contested terrain of ideological values. Despite substantial scholarly attention to the invocation of victimhood in justifying dominant ideologies within institutional and top-down contexts, literature on victimhood in bottom-up discourses remains comparatively scarce, particularly LGBTQ+ communities. This study addresses this gap by examining how victimhood is constructed in both anti- and pro-LGBTQ+ discourses on Malaysian social media, utilizing van Leeuwen's socio-semantic approach to Critical Discourse Studies. Our findings reveal competing victimhood discourses constructed through the polarized ways of manipulating LGBTQ+ individuals' sociological agency. Anti-LGBTQ+ discourse amplifies LGBTQ+ individuals' agency through activation in material transactions, passivation paired with negation, activation of their actions, and association with criminal perpetrators, representing them as powerful victimizers. Pro-LGBTQ+ discourse, conversely, diminishes their agency through passivation, reactions, non-transactions, activation in material transactions combined with negation and interrogation, and association with victims of persecution, positioning them as disempowered victims. Our study highlights how multiple discursive strategies and linguistic resources work in tandem to shape agency, establishing victim-victimizer dynamics that legitimize opposing ideological positions.