The Twitter of Babel: Mapping World Languages through Microblogging Platforms

Type: Article

Publication Date: 2013-04-18

Citations: 286

DOI: https://doi.org/10.1371/journal.pone.0061981

Abstract

Large scale analysis and statistics of socio-technical systems that just a few short years ago would have required the use of consistent economic and human resources can nowadays be conveniently performed by mining the enormous amount of digital data produced by human activities. Although a characterization of several aspects of our societies is emerging from the data revolution, a number of questions concerning the reliability and the biases inherent to the big data "proxies" of social life are still open. Here, we survey worldwide linguistic indicators and trends through the analysis of a large-scale dataset of microblogging posts. We show that available data allow for the study of language geography at scales ranging from country-level aggregation to specific city neighborhoods. The high resolution and coverage of the data allows us to investigate different indicators such as the linguistic homogeneity of different countries, the touristic seasonal patterns within countries and the geographical distribution of different languages in multilingual regions. This work highlights the potential of geolocalized studies of open data sources to improve current analysis and develop indicators for major social phenomena in specific communities.

Locations

  • PLoS ONE - View - PDF
  • PubMed Central - View
  • arXiv (Cornell University) - View - PDF
  • Europe PMC (PubMed Central) - View - PDF
  • Greenwich Academic Literature Archive (University of Greenwich) - View - PDF
  • DOAJ (DOAJ: Directory of Open Access Journals) - View
  • City Research Online (City University London) - View - PDF
  • HAL (Le Centre pour la Communication Scientifique Directe) - View - PDF
  • PubMed - View
  • DataCite API - View

Similar Works

Action Title Year Authors
+ Language Statistics at Different Spatial, Temporal, and Grammatical Scales 2024 Fernanda SĂĄnchez-Puig
Rogelio Lozano-Aranda
Dante PĂ©rez-MĂ©ndez
Ewan Colman
Alfredo J. Morales-GuzmĂĄn
Pedro J. Rivera Torres
Carlos Pineda
Carlos Gershenson
+ Language statistics at different spatial, temporal, and grammatical scales 2022 Fernanda SĂĄnchez-Puig
Rogelio Lozano-Aranda
Dante PĂ©rez-MĂ©ndez
Ewan Colman
Alfredo J. Morales-GuzmĂĄn
Carlos Pineda
Carlos Gershenson
+ PDF Chat Race, religion and the city: twitter word frequency patterns reveal dominant demographic dimensions in the United States 2016 Eszter BokĂĄnyi
DĂĄniel Kondor
LĂĄszlĂł Dobos
Tamás SebƑk
József Stéger
IstvĂĄn Csabai
GĂĄbor Vattay
+ Socioeconomic Dependencies of Linguistic Patterns in Twitter 2018 Jacob Levy Abitbol
MĂĄrton Karsai
Jean-Philippe Magué
Jean‐Pierre Chevrot
Éric Fleury
+ A large scale lexical and semantic analysis of Spanish language variations in Twitter. 2021 Eric S. TĂ©llez
Daniela Moctezuma
Sabino Miranda‐JimĂ©nez
Mario Graff
+ Mapping Languages and Demographics with Georeferenced Corpora 2020 Jonathan Dunn
Benjamin Adams
+ Geolocation differences of language use in urban areas. 2021 Olga Kellert
N. H. Matlis
+ Geolocation differences of language use in urban areas 2021 Olga Kellert
N. H. Matlis
+ PDF Chat Crowdsourcing Dialect Characterization through Twitter 2014 Bruno Gonçalves
David SĂĄnchez
+ PDF Chat Where in the World Are You? Geolocation and Language Identification in Twitter 2014 Mark Graham
Scott A. Hale
Devin Gaffney
+ PDF Chat Immigrant community integration in world cities 2018 Fabio Lamanna
Maxime Lenormand
MarĂ­a Henar Salas-Olmedo
Gustavo Romanillos
Bruno Gonçalves
José J. Ramasco
+ PDF Chat Analyzing Temporal Relationships between Trending Terms on Twitter and Urban Dictionary Activity 2020 Steven R. Wilson
Walid Magdy
Barbara McGillivray
Gareth Tyson
+ Analyzing Temporal Relationships between Trending Terms on Twitter and Urban Dictionary Activity 2020 Steven R. Wilson
Walid Magdy
Barbara McGillivray
Gareth Tyson
+ Analyzing Temporal Relationships between Trending Terms on Twitter and Urban Dictionary Activity 2020 Steven R. Wilson
Walid Magdy
Barbara McGillivray
Gareth Tyson
+ Deriving Disinformation Insights from Geolocalized Twitter Callouts 2021 David Tuxworth
Dimosthenis Antypas
Luis Espinosa-Anke
José Camacho-Collados
Alun Preece
David Rogers
+ A Python Library for Exploratory Data Analysis on Twitter Data based on Tokens and Aggregated Origin-Destination Information 2020 Mario Graff
Daniela Moctezuma
Sabino Miranda‐JimĂ©nez
Eric S. TĂ©llez
+ American cultural regions mapped through the lexical analysis of social media 2022 Thomas Louf
B. Gonçalves
José J. Ramasco
David SĂĄnchez
Jack Grieve
+ PDF Chat American cultural regions mapped through the lexical analysis of social media 2023 Thomas Louf
Bruno Gonçalves
José J. Ramasco
David SĂĄnchez
Jack Grieve
+ Dialectometric analysis of language variation in Twitter 2017 G. Donoso
David SĂĄnchez
+ Geo-located Twitter as proxy for global mobility patterns 2014 Bartosz Hawelka
Izabela Sitko
Euro Beinat
Stanislav Sobolevsky
Pavlos Kazakopoulos
Carlo Ratti

Works That Cite This (69)

Action Title Year Authors
+ PDF Chat Model reproduces individual, group and collective dynamics of human contact networks 2016 Michele Starnini
Andrea Baronchelli
Romualdo Pastor‐Satorras
+ PDF Chat Collective attention in the age of (mis)information 2015 Delia Mocanu
Luca Rossi
Qian Forrest Zhang
MĂĄrton Karsai
Walter Quattrociocchi
+ PDF Chat Computational socioeconomics 2019 Jian Gao
Yi‐Cheng Zhang
Tao Zhou
+ Mapping Languages and Demographics with Georeferenced Corpora 2020 Jonathan Dunn
Benjamin Adams
+ PDF Chat Overcoming Language Disparity in Online Content Classification with Multimodal Learning 2022 Gaurav Verma
Rohit Mujumdar
Zijie J. Wang
Munmun De Choudhury
Srijan Kumar
+ Computational Sociolinguistics: A Survey 2015 Dong Nguyen
A. Seza Doğruöz
Carolyn Penstein Rosé
Franciska de Jong
+ Twitter as a Source of Global Mobility Patterns for Social Good. 2016 Mark Dredze
Manuel García–Herranz
Alex Rutherford
Gideon Mann
+ PDF Chat Detecting and modelling real percolation and phase transitions of information on social media 2021 Jiarong Xie
Fanhui Meng
Jiachen Sun
Xiao Ma
Gang Yan
Yanqing Hu
+ PDF Chat Computational Sociolinguistics: A Survey 2016 Dong Nguyen
A. Seza Doğruöz
Carolyn Penstein Rosé
Franciska de Jong
+ PDF Chat Everyday the Same Picture: Popularity and Content Diversity 2017 Alessandro Bessi
Fabiana Zollo
Michela Del Vicario
Antonio Scala
Fabio Petroni
Bruno Gonçcalves
Walter Quattrociocchi

Works Cited by This (9)

Action Title Year Authors
+ PDF Chat Modeling Users' Activity on Twitter Networks: Validation of Dunbar's Number 2011 Bruno Gonçalves
Nicola Perra
Alessandro Vespignani
+ Understanding individual human mobility patterns 2008 Marta C. GonzĂĄlez
CĂ©sar A. Hidalgo
Albert‐László Barabási
+ PDF Chat Assessing Vaccination Sentiments with Online Social Media: Implications for Infectious Disease Dynamics and Control 2011 Marcel Salathé
Shashank Khandelwal
+ PDF Chat Structural and Dynamical Patterns on Online Social Networks: The Spanish May 15th Movement as a Case Study 2011 Javier Borge‐Holthoefer
Alejandro Rivero
Iñigo Erquicia García
Elisa Cauhé
Alfredo Ferrer
DarĂ­o Ferrer
David Francos
D. ĂĂ±iguez
MarĂ­a Pilar FalcĂłn PĂ©rez
Gonzalo Ruiz DĂ­az
+ PDF Chat A planetary nervous system for social mining and collective awareness 2012 Fosca Giannotti
Dino Pedreschi
Alex Pentland
Paul Lukowicz
Donald Kossmann
J. Crowley
Dirk Helbing
+ PDF Chat Where in the World Are You? Geolocation and Language Identification in Twitter 2014 Mark Graham
Scott A. Hale
Devin Gaffney
+ PDF Chat Truthy 2011 A. Ratkiewicz
Michael Conover
Mark Meiss
Bruno Gonçalves
Snehal Patil
Alessandro Flammini
Filippo Menczer
+ "I Wanted to Predict Elections with Twitter and all I got was this Lousy Paper" A Balanced Survey on Election Prediction using Twitter Data 2012 Daniel Gayo-Avello
+ PDF Chat Structure and tie strengths in mobile communication networks 2007 Jukka‐Pekka Onnela
Jari SaramÀki
Jörkki Hyvönen
GĂĄbor SzabĂł
David Lazer
Kimmo Kaski
Jånos Kertész
Albert‐László Barabási