MASIVE: Open-Ended Affective State Identification in English and Spanish

Type: Preprint

Publication Date: 2024-07-16

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2407.12196

Abstract

In the field of emotion analysis, much NLP research focuses on identifying a limited number of discrete emotion categories, often applied across languages. These basic sets, however, are rarely designed with textual data in mind, and culture, language, and dialect can influence how particular emotions are interpreted. In this work, we broaden our scope to a practically unbounded set of \textit{affective states}, which includes any terms that humans use to describe their experiences of feeling. We collect and publish MASIVE, a dataset of Reddit posts in English and Spanish containing over 1,000 unique affective states each. We then define the new problem of \textit{affective state identification} for language generation models framed as a masked span prediction task. On this task, we find that smaller finetuned multilingual models outperform much larger LLMs, even on region-specific Spanish affective states. Additionally, we show that pretraining on MASIVE improves model performance on existing emotion benchmarks. Finally, through machine translation experiments, we find that native speaker-written data is vital to good performance on this task.

Locations

  • arXiv (Cornell University) - View - PDF

Similar Works

Action Title Year Authors
+ Evaluating Emotion Arcs Across Languages: Bridging the Global Divide in Sentiment Analysis 2023 Daniela Teodorescu
Saif M. Mohammad
+ Evaluating Emotion Arcs Across Languages: Bridging the Global Divide in Sentiment Analysis 2023 Daniela Teodorescu
Saif M. Mohammad
+ Cross-lingual Emotion Intensity Prediction 2020 Irean Navas Alejo
Toni BadĂ­a
Jeremy Barnes
+ Using Emotion Embeddings to Transfer Knowledge Between Emotions, Languages, and Annotation Formats 2022 Georgios Chochlakis
Gireesh Mahajan
Sabyasachee Baruah
Keith Burghardt
Kristina Lerman
Shrikanth Narayanan
+ PDF Chat Evaluating the Capabilities of Large Language Models for Multi-label Emotion Understanding 2024 Tadesse Destaw Belay
Israel Abebe Azime
Abinew Ali Ayele
Grigori Sidorov
Dietrich Klakow
Philipp Slusallek
Olga Kolesnikova
Seid Muhie Yimam
+ LEIA: Linguistic Embeddings for the Identification of Affect 2023 Segun Taofeek Aroyehun
Lukas Malik
H. Metzler
Nikolas Haimerl
Anna Di Natale
David GarcĂ­a NĂșñez
+ UG18 at SemEval-2018 Task 1: Generating Additional Training Data for Predicting Emotion Intensity in Spanish 2018 Marloes Kuijper
Mike van Lenthe
Rik van Noord
+ UG18 at SemEval-2018 Task 1: Generating Additional Training Data for Predicting Emotion Intensity in Spanish 2018 Marloes Kuijper
Mike van Lenthe
Rik van Noord
+ UG18 at SemEval-2018 Task 1: Generating Additional Training Data for Predicting Emotion Intensity in Spanish 2018 Marloes Kuijper
Mike van Lenthe
Rik van Noord
+ PDF Chat Rethinking Emotion Annotations in the Era of Large Language Models 2024 Minxue Niu
Yara El-Tawil
Amrit Romana
Emily Mower Provost
+ PDF Chat EmoLLMs: A Series of Emotional Large Language Models and Annotation Tools for Comprehensive Affective Analysis 2024 Zhiwei Liu
Kailai Yang
Qianqian Xie
Tianlin Zhang
Sophia Ananiadou
+ EmoLLMs: A Series of Emotional Large Language Models and Annotation Tools for Comprehensive Affective Analysis 2024 Zhiwei Liu
Kailai Yang
Tianlin Zhang
Qianqian Xie
Zeping Yu
Sophia Ananiadou
+ PDF Chat Large Language Models for Cross-lingual Emotion Detection 2024 Ram Mohan Rao Kadiyala
+ Learning and Evaluating Emotion Lexicons for 91 Languages 2020 Sven Buechel
Susanna RĂŒcker
Udo Hahn
+ Learning and Evaluating Emotion Lexicons for 91 Languages 2020 Sven Buechel
Susanna RĂŒcker
Udo Hahn
+ Learning and Evaluating Emotion Lexicons for 91 Languages 2020 Sven Buechel
Susanna RĂŒcker
Udo Hahn
+ Towards a Unified Framework for Emotion Analysis. 2020 Sven Buechel
Luise Modersohn
Udo Hahn
+ PDF Chat AIMA at SemEval-2024 Task 10: History-Based Emotion Recognition in Hindi-English Code-Mixed Conversations 2025 Mohammad Mahdi Abootorabi
Nona Ghazizadeh
Seyed Arshan Dalili
Alireza Ghahramani Kure
Mahshid Dehghani
Ehsaneddin Asgari
+ Emotion Embeddings $\unicode{x2014}$ Learning Stable and Homogeneous Abstractions from Heterogeneous Affective Datasets 2023 Sven Buechel
Udo Hahn
+ Frustratingly Easy Sentiment Analysis of Text Streams: Generating High-Quality Emotion Arcs Using Emotion Lexicons 2022 Daniela Teodorescu
Saif M. Mohammad

Works That Cite This (0)

Action Title Year Authors

Works Cited by This (0)

Action Title Year Authors