Ask a Question

Prefer a chat interface with context about you and your work?

On the Worst Prompt Performance of Large Language Models

On the Worst Prompt Performance of Large Language Models

The performance of large language models (LLMs) is acutely sensitive to the phrasing of prompts, which raises significant concerns about their reliability in real-world scenarios. Existing studies often divide prompts into task-level instructions and case-level inputs and primarily focus on evaluating and improving robustness against variations in tasks-level instructions. However, …