Ask a Question

Prefer a chat interface with context about you and your work?

Measuring Massive Multitask Language Understanding

Measuring Massive Multitask Language Understanding

We propose a new test to measure a text model's multitask accuracy. The test covers 57 tasks including elementary mathematics, US history, computer science, law, and more. To attain high accuracy on this test, models must possess extensive world knowledge and problem solving ability. We find that while most recent …