Overcoming Language Disparity in Online Content Classification with Multimodal Learning

Type: Article

Publication Date: 2022-05-31

Citations: 6

DOI: https://doi.org/10.1609/icwsm.v16i1.19356

Abstract

Advances in Natural Language Processing (NLP) have revolutionized the way researchers and practitioners address crucial societal problems. Large language models are now the standard to develop state-of-the-art solutions for text detection and classification tasks. However, the development of advanced computational techniques and resources is disproportionately focused on the English language, sidelining a majority of the languages spoken globally. While existing research has developed better multilingual and monolingual language models to bridge this language disparity between English and non-English languages, we explore the promise of incorporating the information contained in images via multimodal machine learning. Our comparative analyses on three detection tasks focusing on crisis information, fake news, and emotion recognition, as well as five high-resource non-English languages, demonstrate that: (a) detection frameworks based on pre-trained large language models like BERT and multilingual-BERT systematically perform better on the English language compared against non-English languages, and (b) including images via multimodal learning bridges this performance gap. We situate our findings with respect to existing work on the pitfalls of large language models, and discuss their theoretical and practical implications.

Locations

  • arXiv (Cornell University) - View - PDF
  • Proceedings of the International AAAI Conference on Web and Social Media - View - PDF

Similar Works

Action Title Year Authors
+ Overcoming Language Disparity in Online Content Classification with Multimodal Learning 2022 Gaurav Verma
Rohit Mujumdar
Zijie J. Wang
Munmun De Choudhury
Srijan Kumar
+ PDF Chat MM-Soc: Benchmarking Multimodal Large Language Models in Social Media Platforms 2024 Yiqiao Jin
Minje Choi
Gaurav Verma
Jindong Wang
Srijan Kumar
+ PDF Chat A Self-Learning Multimodal Approach for Fake News Detection 2024 Hao Chen
Guo Hui
Bin Hu
Shu Hu
Jinrong Hu
Siwei Lyu
Xi Wu
Xin Wang
+ PDF Chat Large Visual-Language Models Are Also Good Classifiers: A Study of In-Context Multimodal Fake News Detection 2024 Ye Jiang
Yimin Wang
+ PDF Chat Large Language Models For Text Classification: Case Study And Comprehensive Review 2025 Arina Kostina
Marios D. Dikaiakos
Dimosthenis Stefanidis
George Pallis
+ PDF Chat TT-BLIP: Enhancing Fake News Detection Using BLIP and Tri-Transformer 2024 Eunjee Choi
Jong-Kook Kim
+ PDF Chat A Hybrid Attention Framework for Fake News Detection with Large Language Models 2025 Xiaogang Xu
Peiyang Yu
Zeshui Xu
Jiani Wang
+ Multimodal Fake News Detection via CLIP-Guided Learning 2022 Yangming Zhou
Qichao Ying
Zhenxing Qian
Sheng Li
Xinpeng Zhang
+ Lifelong Learning Natural Language Processing Approach for Multilingual Data Classification 2022 Jędrzej Kozal
Michał Leś
Paweł Zyblewski
Paweł Ksieniewicz
Michał Woźniak
+ PDF Chat Bridging Modalities: Enhancing Cross-Modality Hate Speech Detection with Few-Shot In-Context Learning 2024 Ming Shan Hee
A. Kumaresan
Roy Ka-Wei Lee
+ NLP-CUET@DravidianLangTech-EACL2021: Offensive Language Detection from Multilingual Code-Mixed Text using Transformers 2021 Omar Sharif
Eftekhar Hossain
Mohammed Moshiul Hoque
+ PDF Chat Multimodal Fact-Checking with Vision Language Models: A Probing Classifier based Solution with Embedding Strategies 2024 Recep Fırat Çekinel
Pınar Karagöz
Çağrı Çöltekin
+ PDF Chat MultiHateClip: A Multilingual Benchmark Dataset for Hateful Video Detection on YouTube and Bilibili 2024 Han Wang
T. Z. Yang
Usman Naseem
Roy Ka-Wei Lee
+ PDF Chat Multi3Hate: Multimodal, Multilingual, and Multicultural Hate Speech Detection with Vision-Language Models 2024 Minh Duc Bui
Katharina von der Wense
Anne Lauscher
+ NLP-CUET@DravidianLangTech-EACL2021: Offensive Language Detection from Multilingual Code-Mixed Text using Transformers 2021 Omar Sharif
Eftekhar Hossain
Mohammed Moshiul Hoque
+ NLP-CUET@DravidianLangTech-EACL2021: Offensive Language Detection from Multilingual Code-Mixed Text using Transformers 2021 Omar Sharif
Eftekhar Hossain
Mohammed Moshiul Hoque
+ PDF Chat A Roadmap for Multilingual, Multimodal Domain Independent Deception Detection 2024 Dainis Boumber
Rakesh Verma
Fatima Zahra Qachfar
+ PDF Chat Leveraging Chat-Based Large Vision Language Models for Multimodal Out-Of-Context Detection 2024 Fatma Shalabi
Hichem Felouat
Huy H. Nguyen
Isao Echizen
+ PDF Chat Harnessing Artificial Intelligence to Combat Online Hate: Exploring the Challenges and Opportunities of Large Language Models in Hate Speech Detection 2024 Tharindu Kumarage
Amrita Bhattacharjee
Joshua Garland
+ Leveraging Language Identification to Enhance Code-Mixed Text Classification 2023 Gauri Takawane
Abhishek Phaltankar
Varad Patwardhan
Aryan Patil
Raviraj Joshi
Mukta Takalikar