Computational Sociolinguistics: A Survey

Type: Article

Publication Date: 2016-06-17

Citations: 201

DOI: https://doi.org/10.1162/coli_a_00258

View Chat PDF

Abstract

Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimension of language. In this article we present a survey of the emerging field of “computational sociolinguistics” that reflects this increased interest. We aim to provide a comprehensive overview of CL research on sociolinguistic themes, featuring topics such as the relation between language and social identity, language use in social interaction, and multilingual communication. Moreover, we demonstrate the potential for synergy between the research communities involved, by showing how the large-scale data-driven methods that are widely used in CL can complement existing sociolinguistic studies, and how sociolinguistics can inform and challenge the methods and assumptions used in CL studies. We hope to convey the possible benefits of a closer collaboration between the two communities and conclude with a discussion of open challenges.

Locations

  • Computational Linguistics - View - PDF
  • arXiv (Cornell University) - View - PDF
  • Ghent University Academic Bibliography (Ghent University) - View - PDF
  • Data Archiving and Networked Services (DANS) - View - PDF

Similar Works

Action Title Year Authors
+ Computational Sociolinguistics: A Survey 2015 Dong Nguyen
A. Seza Doğruöz
Carolyn Penstein Rosé
Franciska de Jong
+ The sociolinguistic foundations of language modeling 2025 Jack Grieve
Sara Bartl
Matteo Fuoli
Jason Grafmiller
Weihang Huang
Alejandro Jawerbaum
Akira Murakami
Marcus Perlman
Dana Roemling
Bodo Winter
+ PDF Chat The Sociolinguistic Foundations of Language Modeling 2024 Jack Grieve
Sara Bartl
Matteo Fuoli
Jason Grafmiller
Weihang Huang
Alejandro Jawerbaum
Akira Murakami
Marcus Perlman
Dana Roemling
Bodo Winter
+ A Survey of Code-switching: Linguistic and Social Perspectives for Language Technologies 2023 A. Seza Doğruöz
Sunayana Sitaram
Barbara E. Bullock
Almeida Jacqueline Toribio
+ PDF Chat How We Do Things With Words: Analyzing Text as Social and Cultural Data 2020 Dong Nguyen
Maria Liakata
Simon DeDeo
Jacob Eisenstein
David Mimno
Rebekah Tromble
Jane Winters
+ PDF Chat Tracing Semantic Variation in Slang 2022 Zhewei Sun
Yang Xu
+ PDF Chat Survey of Cultural Awareness in Language Models: Text and Beyond 2024 Siddhesh Pawar
Junyeong Park
Jiho Jin
Arnav Arora
Junho Myung
Srishti Yadav
Faiz Ghifari Haznitrama
Inhwa Song
Alice Oh
Isabelle Augenstein
+ Tracing Semantic Variation in Slang 2022 Zhewei Sun
Yang Xu
+ AI for social science and social science of AI: A Survey 2024 Ruoxi Xu
Yingfei Sun
Mengjie Ren
Shiguang Guo
Ruotong Pan
Hongyu Lin
Le Sun
Xianpei Han
+ A Data-driven Approach to Crosslinguistic Structural Biases 2021 Alex Kramer
Zoey Liu
+ Shaping the Emerging Norms of Using Large Language Models in Social Computing Research 2023 Hong Shen
Tianshi Li
Toby Jia-Jun Li
Joon Sung Park
Diyi Yang
+ Shaping the Emerging Norms of Using Large Language Models in Social Computing Research 2023 Hong Shen
Tianshi Li
Toby Jia-Jun Li
Joon-Sung Park
Diyi Yang
+ PDF Chat Large Language Models and Thematic Analysis: Human-AI Synergy in Researching Hate Speech on Social Media 2024 Petre Breazu
Miriam Schirmer
Songbo Hu
Napoleon Katsos
+ What 'Diversity' Means Depends on Your Perspective: A Commentary on Kidd and Garcia (2022) 2022 Ruthe Foushee
Marisa Casillas
+ PDF Chat What ‘diversity’ means depends on your perspective: A commentary on Kidd and Garcia (2022) 2022 Ruthe Foushee
Marisa Casillas
+ PDF Chat How Language Learning and Language Use Create Linguistic Structure 2022 Kenny Smith
+ Bridging Interpersonal and Ecological Dynamics of Cognition through a Systems Framework of Bilingualism 2021 Mehrgol Tiv
Ethan Kutlu
Jason W. Gullifer
Ruo Ying Feng
Marina M. Doucerain
Debra Titone
+ The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment" in Large Language Models 2023 Hannah Rose Kirk
Bertie Vidgen
Paul Röttger
Scott A. Hale
+ PDF Chat Impoverished Language Technology: The Lack of (Social) Class in NLP 2024 Amanda Cercas Curry
Zeerak Talat
Dirk Hovy
+ Building and curating conversational corpora for diversity-aware language science and technology 2022 Andreas Liesenfeld
Mark Dingemanse

Cited by (43)

Action Title Year Authors
+ Finding Your Voice: The Linguistic Development of Mental Health Counselors 2019 Justine Zhang
Robert Filbin
Christine Morrison
Jaclyn Weiser
Cristian Danescu-Niculescu-Mizil
+ Representativeness as a Forgotten Lesson for Multilingual and Code-switched Data Collection and Preparation 2023 A. Seza Doğruöz
Sunayana Sitaram
Zheng Yong
+ Socioeconomic Dependencies of Linguistic Patterns in Twitter 2018 Jacob Levy Abitbol
Márton Karsai
Jean-Philippe Magué
Jean‐Pierre Chevrot
Éric Fleury
+ Learning Similarity between Movie Characters and Its Potential Implications on Understanding Human Experiences 2020 Zhilin Wang
Weizhe Lin
Xiaodong Wu
+ PDF Chat Words as Gatekeepers: Measuring Discipline-specific Terms and Meanings in Scholarly Publications 2023 Li Lucy
Jesse Dodge
David Bamman
Katherine A. Keith
+ PDF Chat Tracing Semantic Variation in Slang 2022 Zhewei Sun
Yang Xu
+ You Write like You Eat: Stylistic Variation as a Predictor of Social Stratification 2019 Angelo Basile
Albert Gatt
Malvina Nissim
+ PDF Chat Can Large Language Models Transform Computational Social Science? 2023 Caleb Ziems
William A. Held
Omar Ahmed Shaikh
Jiaao Chen
Zhehao Zhang
Diyi Yang
+ Fairness in Language Models Beyond English: Gaps and Challenges 2023 Krithika Ramesh
Sunayana Sitaram
Monojit Choudhury
+ PDF Chat VALUE: Understanding Dialect Disparity in NLU 2022 Caleb Ziems
Jiaao Chen
Camille Harris
Jessica Anderson
Diyi Yang
+ PDF Chat A Computational Approach to Identifying Cultural Keywords Across Languages 2024 Zheng Wei Lim
H. Stuart
Simon De Deyne
Terry Regier
Ekaterina Vylomova
Trevor Cohn
Charles Kemp
+ PDF Chat Learning about Spanish dialects through Twitter 2016 Bruno Gonçalves
David Sánchez
+ PDF Chat Style Transfer Through Back-Translation 2018 Shrimai Prabhumoye
Yulia Tsvetkov
Ruslan Salakhutdinov
Alan W. Black
+ PANDORA Talks: Personality and Demographics on Reddit 2020 Matej Gjurković
Mladen Karan
Iva Vukojević
Mihaela Bošnjak
Jan Šnajder
+ PDF Chat Gender Bias in Machine Translation 2021 Beatrice Savoldi
Marco Gaido
Luisa Bentivogli
Matteo Negri
Marco Turchi
+ PDF Chat Mapping the Americanization of English in space and time 2018 Bruno Gonçalves
Lucía Loureiro‐Porto
José J. Ramasco
David Sánchez
+ PDF Chat How We Do Things With Words: Analyzing Text as Social and Cultural Data 2020 Dong Nguyen
Maria Liakata
Simon DeDeo
Jacob Eisenstein
David Mimno
Rebekah Tromble
Jane Winters
+ PDF Chat Capturing the diversity of multilingual societies 2021 Thomas Louf
David Sánchez
José J. Ramasco
+ PDF Chat Resources for Turkish natural language processing: A critical survey 2022 Çağrı Çöltekin
A. Seza Doğruöz
Özlem Çetinoğlu
+ Does It Capture STEL? A Modular, Similarity-based Linguistic Style Evaluation Framework 2021 Anna Wegmann
Dong Nguyen
+ A Computational Approach to Identifying Cultural Keywords across Languages 2022 Zheng Wei Lim
H. Stuart
Simon De Deyne
Terry Regier
Ekaterina Vylomova
Trevor Cohn
Charles Kemp
+ Personalized Dialogue Generation with Diversified Traits 2019 Yinhe Zheng
Guanyi Chen
Minlie Huang
Song Liu
Xuan Zhu
+ PDF Chat ORCHID: A Chinese Debate Corpus for Target-Independent Stance Detection and Argumentative Dialogue Summarization 2023 Xiutian Zhao
Eric Ke Wang
Wei Peng
+ LMSOC: An Approach for Socially Sensitive Pretraining. 2021 Vivek Kulkarni
Shubhanshu Mishra
Aria Haghighi
+ PDF Chat Neural Unsupervised Domain Adaptation in NLP—A Survey 2020 Alan Ramponi
Barbara Plank
+ PDF Chat Using Sociolinguistic Variables to Reveal Changing Attitudes Towards Sexuality and Gender 2021 Sky CH-Wang
David Jurgens
+ PDF Chat Does It Capture STEL? A Modular, Similarity-based Linguistic Style Evaluation Framework 2021 Anna Wegmann
Dong Nguyen
+ Learning Similarity between Movie Characters and Its Potential Implications on Understanding Human Experiences 2021 Zhilin Wang
Weizhe Lin
Xiaodong Wu
+ Style Transfer Through Back-Translation 2018 Shrimai Prabhumoye
Yulia Tsvetkov
Ruslan Salakhutdinov
Alan W. Black
+ You Write Like You Eat: Stylistic variation as a predictor of social stratification 2019 Angelo Basile
Albert Gatt
Malvina Nissim
+ PDF Chat American cultural regions mapped through the lexical analysis of social media 2023 Thomas Louf
Bruno Gonçalves
José J. Ramasco
David Sánchez
Jack Grieve
+ The sociolinguistic foundations of language modeling 2025 Jack Grieve
Sara Bartl
Matteo Fuoli
Jason Grafmiller
Weihang Huang
Alejandro Jawerbaum
Akira Murakami
Marcus Perlman
Dana Roemling
Bodo Winter
+ PDF Chat Characterizing English Variation across Social Media Communities with BERT 2021 Li Lucy
David Bamman
+ Gender Bias in Machine Translation 2021 Beatrice Savoldi
Marco Gaido
Luisa Bentivogli
Matteo Negri
Marco Turchi
+ Characterizing English Variation across Social Media Communities with BERT 2021 Li Lucy
David Bamman
+ PDF Chat Predicting a Salient Social Identity from Linguistic Style 2019 Miriam Koschate
Luke Dickens
Avelie Stuart
Elahe Naserian
Alessandra Russo
Mark Levine
+ PDF Chat Social analysis of young Basque-speaking communities in twitter 2021 Joseba Fernández de Landa
Rodrigo Agerri
+ PDF Chat Black or White but Never Neutral: How Readers Perceive Identity from Yellow or Skin-toned Emoji 2021 Alexander Robertson
Walid Magdy
Sharon Goldwater
+ PDF Chat Link-centric analysis of variation by demographics in mobile phone communication patterns 2020 Mikaela Irene Fudolig
Kunal Bhattacharya
Daniel Monsivais
Hang-Hyun Jo
Kimmo Kaski
+ Neural Unsupervised Domain Adaptation in NLP---A Survey 2020 Alan Ramponi
Barbara Plank

Citing (17)

Action Title Year Authors
+ Approximate inference and constrained optimization 2002 Tom Heskes
Kees Albers
Bert Kappen
+ PDF Chat Characterizing the Google Books Corpus: Strong Limits to Inferences of Socio-Cultural and Linguistic Evolution 2015 Eitan Adam Pechenick
Christopher M. Danforth
Peter Sheridan Dodds
+ Latent Class and Latent Transition Analysis 2003 Stephanie T. Lanza
Brian P. Flaherty
Linda M. Collins
+ PDF Chat Diffusion of Lexical Change in Social Media 2014 Jacob Eisenstein
Brendan O’Connor
Noah A. Smith
Eric P. Xing
+ PDF Chat The Twitter of Babel: Mapping World Languages through Microblogging Platforms 2013 Delia Mocanu
Andrea Baronchelli
Nicola Perra
Bruno Gonçalves
Qian Zhang
Alessandro Vespignani
+ PDF Chat Exploiting Social Network Structure for Person-to-Person Sentiment Analysis 2014 Robert West
Hristo S. Paskov
Jure Leskovec
Christopher Potts
+ PDF Chat Signed networks in social media 2010 Jure Leskovec
Daniel P. Huttenlocher
Jon Kleinberg
+ PDF Chat The Kappa Statistic: A Second Look 2004 Barbara Di Eugenio
Michael Glass
+ Assessing agreement on classification tasks: the kappa statistic 1996 Jean Carletta
+ A Computational Approach to Politeness with Application to Social Factors 2013 Cristian Danescu-Niculescu-Mizil
Moritz Sudhof
Dan Jurafsky
Jure Leskovec
Christopher Potts
+ PDF Chat Mark my words!: linguistic style accommodation in social media 2011 Cristian Danescu-Niculescu-Mizil
Michael Gamon
Susan Dumais
+ Overview for the Second Shared Task on Language Identification in Code-Switched Data 2016 Giovanni Molina
Fahad AlGhamdi
Mahmoud Ghoneim
Abdelati Hawwari
Nicolas Rey-Villamizar
Mona Diab
Thamar Solorio
+ Efficient Estimation of Word Representations in Vector Space 2013 Tomáš Mikolov
Kai Chen
Greg S. Corrado
Jay B. Dean
+ PDF Chat "You’re Mr. Lebowski, I’m the Dude": Inducing Address Term Formality in Signed Social Networks 2015 Vinodh Krishnan
Jacob Eisenstein
+ PDF Chat Confounds and Consequences in Geotagged Twitter Data 2015 Umashanthi Pavalanathan
Jacob Eisenstein
+ PDF Chat Gender identity and lexical variation in social media 2014 David Bamman
Jacob Eisenstein
Tyler Schnoebelen
+ PDF Chat Mark my words! 2011 Cristian Danescu-Niculescu-Mizil
Michael Gamon
Susan Dumais