Measuring Massive Multitask Language Understanding

Type: Preprint

Publication Date: 2020-01-01

Citations: 35

DOI: https://doi.org/10.48550/arxiv.2009.03300

Locations

  • arXiv (Cornell University) - View
  • DataCite API - View

Similar Works

Action Title Year Authors
+ PDF Chat Measuring Massive Multitask Language Understanding 2020 Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Dawn Song
Jacob Steinhardt
+ Measuring Massive Multitask Chinese Understanding 2023 Hui Zeng
+ PDF Chat FAC$^2$E: Better Understanding Large Language Model Capabilities by Dissociating Language and Cognition 2024 Xiaoqiang Wang
Bang Liu
Lingfei Wu
+ PDF Chat MMLU-CF: A Contamination-free Multi-task Language Understanding Benchmark 2024 Qihao Zhao
Yangyu Huang
Tengchao Lv
Lei Cui
Qi Sun
Shaoguang Mao
Xin Zhang
Ying Xin
Qiufeng Yin
Shan Li
+ PDF Chat MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains 2024 Guoli Yin
Haoping Bai
Shuang Ma
Nan Feng
Yanchao Sun
Zhaoyang Xu
Shen Ma
Jiarui Lu
Xiang Kong
Aonan Zhang
+ PDF Chat Humanity's Last Exam 2025 Long Phan
Alice Gatti
Ziwen Han
Nathaniel Li
Josephina Hu
Hugh Zhang
Shuangshuang Shi
Michael Y. Choi
Arjun Agrawal
Asmita Chopra
+ PDF Chat Dynamic Intelligence Assessment: Benchmarking LLMs on the Road to AGI with a Focus on Model Confidence 2024 Norbert Tihanyi
Tamás Bisztray
Richard A. Dubniczky
Rebeka Tóth
Bertalan Borsos
Bilel Cherif
Mohamed Amine Ferrag
Lajos Muzsai
Ridhi Jain
Ryan Marinelli
+ PDF Chat Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition 2024 Kehua Feng
Keyan Ding
Kede Ma
Zhihua Wang
Qiang Zhang
Huajun Chen
+ CMMLU: Measuring massive multitask language understanding in Chinese 2023 Haonan Li
Yixuan Zhang
Fajri Koto
Yifei Yang
Hai Zhao
Yeyun Gong
Nan Duan
Timothy Baldwin
+ Efficiently Measuring the Cognitive Ability of LLMs: An Adaptive Testing Perspective 2023 Yan Zhuang
Qi Liu
Yuting Ning
Weizhe Huang
Rui Lv
Zhenya Huang
Guanhao Zhao
Zheng Zhang
Qingyang Mao
Shijin Wang
+ Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models 2023 Natalie Shapira
Mosh Levy
Seyed Hossein Alavi
Xuhui Zhou
Yejin Choi
Yoav Goldberg
Maarten Sap
Vered Shwartz
+ PDF Chat TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish 2024 Arda Yüksel
Abdullatif Köksal
Lütfi Kerem Şenel
Anna Korhonen
Hinrich Schütze
+ Critique Ability of Large Language Models 2023 Liangchen Luo
Lin Zi
Yinxiao Liu
Lei Shu
Yun Zhu
Jingbo Shang
Lei Meng
+ Beyond the Tip of the Iceberg: Assessing Coherence of Text Classifiers 2021 Shane Storks
Joyce Chai
+ M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models 2023 Wenxuan Zhang
Sharifah Mahani Aljunied
Chang Gao
Yew Ken Chia
Lidong Bing
+ Beyond the Tip of the Iceberg: Assessing Coherence of Text Classifiers 2021 Shane Storks
Joyce Chai
+ PDF Chat Probing the Robustness of Theory of Mind in Large Language Models 2024 Christian H. Nickel
Laura Schrewe
Lucie Flek
+ PDF Chat When LLMs Meet Cunning Questions: A Fallacy Understanding Benchmark for Large Language Models 2024 Yinghui Li
Qingyu Zhou
Yuanzhen Luo
Shirong Ma
Yangning Li
Hai-Tao Zheng
Xuming Hu
Philip S. Yu
+ Exploring the Cognitive Knowledge Structure of Large Language Models: An Educational Diagnostic Assessment Approach 2023 Zheyuan Zhang
Jifan Yu
Juanzi Li
Lei Hou
+ Large Language Models' Understanding of Math: Source Criticism and Extrapolation 2023 Roozbeh Yousefzadeh
Xuenan Cao

Works That Cite This (15)

Action Title Year Authors
+ PDF Chat The Good, the Bad, and the Ugly: The Role of Ai Quality Disclosure in Lie Detection 2024 Haimanti Bhattacharya
Subhasish Dugar
Sanchaita Hazra
Bodhisattwa Prasad Majumder
+ Generative AI-Based Text Generation Methods Using Pre-Trained GPT 2 Model 2024 Rohit Pandey
Hetvi Waghela
Sneha Rakshit
Aparna Rangari
Anjali Singh
Rahul Kumar
Ratnadeep Ghoshal
Jaydip Sen
+ Thrilled by Your Progress! Large Language Models (GPT-4) No Longer Struggle to Pass Assessments in Higher Education Programming Courses 2023 Jaromír Šavelka
Arav Agarwal
Marshall An
Christopher Bogart
Majd Sakr
+ PDF Chat Designing Heterogeneous LLM Agents for Financial Sentiment Analysis 2024 Frank Xing
+ Designing Heterogeneous LLM Agents for Financial Sentiment Analysis 2024 Frank Xing
+ PDF Chat Applications of Generative AI in Healthcare: algorithmic, ethical, legal and societal considerations 2024 Onyekachukwu R. Okonji
Kamol Yunusov
Bonnie Gordon
+ PDF Chat Domain-specific chatbots for science using embeddings 2023 Kevin G. Yager
+ PDF Chat OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models 2024 Chang-Hun Lee
Jungyu Jin
Taesu Kim
Hyungjun Kim
Eunhyeok Park
+ Can Generative Pre-trained Transformers (GPT) Pass Assessments in Higher Education Programming Courses? 2023 Jaromír Šavelka
Arav Agarwal
Christopher Bogart
Yifan Song
Majd Sakr
+ Can Generative Pre-trained Transformers (GPT) Pass Assessments in Higher Education Programming Courses? 2023 Jaromír Šavelka
Arav Agarwal
Christopher Bogart
Yifan Song
Majd Sakr