Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes
in Mathematical Reasoning
Exposing the Achilles' Heel: Evaluating LLMs Ability to Handle Mistakes
in Mathematical Reasoning
Large Language Models (LLMs) have been applied to Math Word Problems (MWPs) with transformative impacts, revolutionizing how these complex problems are approached and solved in various domains including educational settings. However, the evaluation of these models often prioritizes final accuracy, overlooking the crucial aspect of reasoning capabilities. This work addresses …