XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation

Type: Preprint

Publication Date: 2020-01-01

Citations: 80

DOI: https://doi.org/10.48550/arxiv.2004.01401

View

Locations

  • arXiv (Cornell University) - View
  • DataCite API - View

Similar Works

Action Title Year Authors
+ VECO: Variable Encoder-decoder Pre-training for Cross-lingual Understanding and Generation 2021 Fuli Luo
Wei Wang
Jiahao Liu
Yijia Liu
Bin Bi
Songfang Huang
Fei Huang
Luo Si
+ All NLP Tasks Are Generation Tasks: A General Pretraining Framework 2021 Zhengxiao Du
Yujie Qian
Xiao Liu
Ming Ding
Jiezhong Qiu
Zhilin Yang
Jie Tang
+ VECO: Variable and Flexible Cross-lingual Pre-training for Language Understanding and Generation 2020 Fuli Luo
Wei Wang
Jiahao Liu
Yijia Liu
Bin Bi
Songfang Huang
Fei Huang
Luo Si
+ XLM-E: Cross-lingual Language Model Pre-training via ELECTRA 2021 Zewen Chi
Shaohan Huang
Dong Li
Shuming Ma
B. Zheng
Saksham Singhal
Payal Bajaj
Song Xia
Xian-Ling Mao
Heyan Huang
+ XNLI 2.0: Improving XNLI dataset and performance on Cross Lingual Understanding (XLU) 2023 Ankit Kumar Upadhyay
Harsit Kumar Upadhya
+ PDF Chat XNLI 2.0: Improving XNLI dataset and performance on Cross Lingual Understanding (XLU) 2023 Ankit Kumar Upadhyay
Harsit Kumar Upadhya
+ Cross-lingual Language Model Pretraining 2019 Guillaume Lample
Alexis Conneau
+ GLGE: A New General Language Generation Evaluation Benchmark 2020 Dayiheng Liu
Yu Yan
Yeyun Gong
Weizhen Qi
Hang Zhang
Jian Jiao
Weizhu Chen
Jie Fu
Linjun Shou
Ming Gong
+ Extrapolating Multilingual Understanding Models as Multilingual Generators 2023 Bohong Wu
Fei Yuan
Hai Zhao
Lei Li
Jingjing Xu
+ VECO: Variable and Flexible Cross-lingual Pre-training for Language Understanding and Generation 2021 Fuli Luo
Wei Wang
Jiahao Liu
Yijia Liu
Bin Bi
Songfang Huang
Fei Huang
Luo Si
+ QiaoNing at SemEval-2020 Task 4: Commonsense Validation and Explanation system based on ensemble of language model 2020 Pai Liu
+ PolyLM: An Open Source Polyglot Large Language Model 2023 Xiangpeng Wei
Haoran Wei
Huan Lin
Tianhao Li
Pei Zhang
Xingzhang Ren
Mei Li
Yu Wan
Zhiwei Cao
Bin-Bin Xie
+ On Learning Universal Representations Across Languages 2020 Xiangpeng Wei
Yue Hu
Rongxiang Weng
Luxi Xing
Heng Yu
Weihua Luo
+ TURNA: A Turkish Encoder-Decoder Language Model for Enhanced Understanding and Generation 2024 Gökçe Uludoğan
Zeynep Yirmibeşoğlu Balal
Furkan Akkurt
Melikşah Türker
Onur Güngör
Suzan Üsküdarlı
+ Extrapolating Multilingual Understanding Models as Multilingual Generators 2023 Bohong Wu
Fei Yuan
Hai Zhao
Lei Li
Jingjing Xu
+ ERNIE-UniX2: A Unified Cross-lingual Cross-modal Framework for Understanding and Generation 2022 Bin Shan
Yaqian Han
Weichong Yin
Shuohuan Wang
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
+ CUGE: A Chinese Language Understanding and Generation Evaluation Benchmark 2021 Yuan Yao
Qingxiu Dong
Jian Guan
Boxi Cao
Zhengyan Zhang
Chaojun Xiao
Xiaozhi Wang
Fanchao Qi
Junwei Lucas Bao
Jinran Nie
+ PDF Chat Qwen2 Technical Report 2024 Yang An
Baosong Yang
Binyuan Hui
Bo Zheng
Bowen Yu
Chang Zhou
Cheng‐Peng Li
Chengyuan Li
Dayiheng Liu
Fei Huang
+ Dual Inference for Improving Language Understanding and Generation 2020 Shang‐Yu Su
Yung-Sung Chuang
Yun-Nung Chen
+ Dual Inference for Improving Language Understanding and Generation 2020 Shang‐Yu Su
Yung-Sung Chuang
Yun-Nung Chen

Cited by (62)

Action Title Year Authors
+ PDF Chat Model Selection for Cross-Lingual Transfer 2020 Yang Chen
Alan Ritter
+ Don't Use English Dev: On the Zero-Shot Cross-Lingual Evaluation of Contextual Embeddings 2020 Phillip Keung
Yichao Lu
Julián Salazar
Vikas Bhardwaj
+ PDF Chat IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP 2020 Fajri Koto
Afshin Rahimi
Jey Han Lau
Timothy Baldwin
+ Cross-lingual Text Classification with Heterogeneous Graph Neural Network 2021 Ziyun Wang
Xuan Liu
Peiji Yang
Shixing Liu
Zhisheng Wang
+ X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models 2020 Zhengbao Jiang
Antonios Anastasopoulos
Jun Araki
Haibo Ding
Graham Neubig
+ RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark 2020 Tatiana Shavrina
Alena Fenogenova
Emelyanov Anton
Denis Shevelev
Ekaterina Artemova
Valentin Malykh
Vladislav Mikhailov
Maria Tikhonova
Andrey Chertok
Andrey Evlampiev
+ Which *BERT? A Survey Organizing Contextualized Encoders 2020 Patrick Xia
Shijie Wu
Benjamin Van Durme
+ PDF Chat MEGA: Multilingual Evaluation of Generative AI 2023 Kabir Ahuja
Harshita Diddee
Rishav Hada
Millicent Ochieng
Krithika Ramesh
Prachi Jain
Akshay Nambi
Tanuja Ganu
Sameer Segal
Mohamed Ahmed
+ CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation 2021 Shuai Lu
Daya Guo
Shuo Ren
Junjie Huang
A. Svyatkovskiy
Ambrosio Blanco
Colin B. Clement
Dawn Drain
Daxin Jiang
Duyu Tang
+ The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics 2021 Sebastian Gehrmann
Tosin Adewumi
Karmanya Aggarwal
Pawan Sasanka Ammanamanchi
Anuoluwapo Aremu
Antoine Bosselut
Khyathi Raghavi Chandu
Miruna Clinciu
Dipanjan Das
Kaustubh Dhole
+ InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training 2020 Zewen Chi
Li Dong
Furu Wei
Nan Yang
Saksham Singhal
Wenhui Wang
Song Xia
Xian-Ling Mao
Heyan Huang
Ming Zhou
+ PDF Chat FILTER: An Enhanced Fusion Method for Cross-lingual Language Understanding 2021 Yuwei Fang
Shuohang Wang
Zhe Gan
Siqi Sun
Jun Liu
+ PDF Chat On the Importance of Word Order Information in Cross-lingual Sequence Labeling 2021 Zihan Liu
Genta Indra Winata
Samuel Cahyawijaya
Andrea Madotto
Zhaojiang Lin
Pascale Fung
+ ExplainaBoard: An Explainable Leaderboard for NLP 2021 Pengfei Liu
Jinlan Fu
Xiao Yang
Weizhe Yuan
Shuaichen Chang
Junqi Dai
Yixin Liu
Zihuiwen Ye
Graham Neubig
+ PDF Chat Probing Multilingual BERT for Genetic and Typological Signals 2020 Taraka Rama
Lisa Beinborn
Steffen Eger
+ PDF Chat Enhancing Multilingual Language Model with Massive Multilingual Knowledge Triples 2022 Linlin Liu
Xin Li
Ruidan He
Lidong Bing
Shafiq Joty
Luo Si
+ VECO: Variable and Flexible Cross-lingual Pre-training for Language Understanding and Generation 2020 Fuli Luo
Wei Wang
Jiahao Liu
Yijia Liu
Bin Bi
Songfang Huang
Fei Huang
Luo Si
+ PDF Chat Multilingual Language Models Predict Human Reading Behavior 2021 Nora Hollenstein
Federico Pirovano
Ce Zhang
Lena A. Jäger
Lisa Beinborn
+ VECO: Variable and Flexible Cross-lingual Pre-training for Language Understanding and Generation 2021 Fuli Luo
Wei Wang
Jiahao Liu
Yijia Liu
Bin Bi
Songfang Huang
Fei Huang
Luo Si
+ PDF Chat XCOPA: A Multilingual Dataset for Causal Commonsense Reasoning 2020 Edoardo Maria Ponti
Goran Glavašš
Olga Majewska
Qianchu Liu
Ivan Vulić
Anna Korhonen
+ Detecting Languages Unintelligible to Multilingual Models through Local Structure Probes 2022 Louis Clouâtre
Prasanna Parthasarathi
Amal Zouaq
Sarath Chandar
+ PDF Chat Cross-lingual Machine Reading Comprehension with Language Branch Knowledge Distillation 2020 Junhao Liu
Linjun Shou
Jian Pei
Ming Gong
Min Yang
Daxin Jiang
+ Exploring Fine-tuning Techniques for Pre-trained Cross-lingual Models via Continual Learning 2020 Zihan Liu
Genta Indra Winata
Andrea Madotto
Pascale Fung
+ Zero-Shot Cross-Lingual Transfer with Meta Learning 2020 Farhad Nooralahzadeh
Giannis Bekoulis
Johannes Bjerva
Isabelle Augenstein
+ PDF Chat Multi-task Learning for Multilingual Neural Machine Translation 2020 Yi-Ren Wang
ChengXiang Zhai
Hany Hassan
+ PDF Chat A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets 2023 Md Tahmid Rahman Laskar
M Saiful Bari
Mizanur Rahman
Md Amran Hossen Bhuiyan
Shafiq Joty
Jimmy Xiangji Huang
+ XSemPLR: Cross-Lingual Semantic Parsing in Multiple Natural Languages and Meaning Representations 2023 Yusen Zhang
Jun Wang
Zhiguo Wang
Rui Zhang
+ PDF Chat MTG: A Benchmark Suite for Multilingual Text Generation 2022 Yiran Chen
Zhenqiao Song
Xianze Wu
Danqing Wang
Jingjing Xu
Jiaze Chen
Hao Zhou
Lei Li
+ IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP 2020 Fajri Koto
Afshin Rahimi
Jey Han Lau
Timothy Baldwin
+ ProphetNet-X: Large-Scale Pre-training Models for English, Chinese, Multi-lingual, Dialog, and Code Generation 2021 Weizhen Qi
Yeyun Gong
Yu Yan
Can Xu
Bolun Yao
Bartuer Zhou
Biao Cheng
Daxin Jiang
Jiusheng Chen
Ruofei Zhang
+ Majority Voting with Bidirectional Pre-translation For Bitext Retrieval 2021 Alex Jones
Derry Wijaya
+ PDF Chat MLSUM: The Multilingual Summarization Corpus 2020 Thomas Scialom
Paul-Alexis Dray
Sylvain Lamprier
Benjamin Piwowarski
Jacopo Staiano
+ Do Explicit Alignments Robustly Improve Multilingual Encoders? 2020 Shijie Wu
Mark Dredze
+ Multilingual Argument Mining: Datasets and Analysis 2020 Orith Toledo‐Ronen
Matan Orbach
Yonatan Bilu
Artem Spector
Noam Slonim
+ PDF Chat XLM-K: Improving Cross-Lingual Language Model Pre-training with Multilingual Knowledge 2022 Xiaoze Jiang
Yaobo Liang
Weizhu Chen
Nan Duan
+ PDF Chat Multi-Domain Targeted Sentiment Analysis 2022 Orith Toledo‐Ronen
Matan Orbach
Yoav Katz
Noam Slonim
+ Bilingual Language Modeling, A transfer learning technique for Roman Urdu 2021 Usama Khalid
Mirza Omer Beg
Muhammad Umair Arshad
+ Don’t Use English Dev: On the Zero-Shot Cross-Lingual Evaluation of Contextual Embeddings 2020 Phillip Keung
Yichao Lu
Julián Salazar
Vikas Bhardwaj
+ Do Explicit Alignments Robustly Improve Multilingual Encoders? 2020 Shijie Wu
Mark Dredze
+ Bootstrapping a Crosslingual Semantic Parser 2020 Tom Sherborne
Yumo Xu
Mirella Lapata

Citing (18)

Action Title Year Authors
+ XNLI: Evaluating Cross-lingual Sentence Representations 2018 Alexis Conneau
Guillaume Lample
Ruty Rinott
Adina Williams
Samuel R. Bowman
Holger Schwenk
Veselin Stoyanov
+ PAWS: Paraphrase Adversaries from Word Scrambling 2019 Yuan Zhang
Jason Baldridge
Luheng He
+ Attention is All you Need 2017 Ashish Vaswani
Noam Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan N. Gomez
Łukasz Kaiser
Illia Polosukhin
+ RoBERTa: A Robustly Optimized BERT Pretraining Approach 2019 Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
Mike Lewis
Luke Zettlemoyer
Veselin Stoyanov
+ XLNet: Generalized Autoregressive Pretraining for Language Understanding 2019 Zhilin Yang
Zihang Dai
Yiming Yang
Jaime Carbonell
Ruslan Salakhutdinov
Quoc V. Le
+ PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification 2019 Yinfei Yang
Yuan Zhang
Chris Tar
Jason Baldridge
+ Unicoder: A Universal Language Encoder by Pre-training with Multiple Cross-lingual Tasks 2019 Haoyang Huang
Yaobo Liang
Nan Duan
Ming Gong
Linjun Shou
Daxin Jiang
Ming Zhou
+ Unified Language Model Pre-training for Natural Language Understanding and Generation 2019 Li Dong
Nan Yang
Wenhui Wang
Furu Wei
Xiaodong Liu
Yu Wang
Jianfeng Gao
Ming Zhou
Hsiao-Wuen Hon
+ MLQA: Evaluating Cross-lingual Extractive Question Answering 2019 Patrick Lewis
Barlas Oğuz
Ruty Rinott
Sebastian Riedel
Holger Schwenk
+ BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. 2019 Mike Lewis
Yinhan Liu
Naman Goyal
Marjan Ghazvininejad
Abdelrahman Mohamed
Omer Levy
Ves Stoyanov
Luke Zettlemoyer
+ Unsupervised Cross-lingual Representation Learning at Scale. 2019 Alexis Conneau
Kartikay Khandelwal
Naman Goyal
Vishrav Chaudhary
Guillaume Wenzek
Francisco Guzmán
Édouard Grave
Myle Ott
Luke Zettlemoyer
Veselin Stoyanov
+ CCMatrix: Mining Billions of High-Quality Parallel Sentences on the WEB 2019 Holger Schwenk
Guillaume Wenzek
Sergey Edunov
Édouard Grave
Armand Joulin
+ CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data 2019 Guillaume Wenzek
Marie-Anne Lachaux
Alexis Conneau
Vishrav Chaudhary
Francisco Guzmán
Armand Joulin
Édouard Grave
+ PDF Chat Unified Vision-Language Pre-Training for Image Captioning and VQA 2020 Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
+ PDF Chat Unicoder-VL: A Universal Encoder for Vision and Language by Cross-Modal Pre-Training 2020 Gen Li
Nan Duan
Yuejian Fang
Ming Gong
Daxin Jiang
+ PDF Chat Cross-Lingual Natural Language Generation via Pre-Training 2020 Zewen Chi
Dong Li
Furu Wei
Wenhui Wang
Xian-Ling Mao
Heyan Huang
+ ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training 2020 Weizhen Qi
Yu Yan
Yeyun Gong
Dayiheng Liu
Nan Duan
Jiusheng Chen
Ruofei Zhang
Ming Zhou
+ XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization 2020 Junjie Hu
Sebastian Ruder
Aditya Siddhant
Graham Neubig
Orhan Fırat
Melvin Johnson