Mathematics, word problems, common sense, and artificial intelligence

Authors

Type: Article
Publication Date: 2024-02-15
Citations: 54
DOI: https://doi.org/10.1090/bull/1828

Abstract

The paper discusses the capacities and limitations of current artificial intelligence (AI) technology to solve word problems that combine elementary mathematics with commonsense reasoning. No existing AI systems can solve these reliably. We review three approaches that have been developed, using AI natural language technology: outputting the answer directly, outputting a computer program that solves the problem, and outputting a formalized representation that can be input to an automated theorem verifier. We review some benchmarks that have been developed to evaluate these systems and some experimental studies. We discuss the limitations of the existing technology at solving these kinds of problems. We argue that it is not clear whether these kinds of limitations will be important in developing AI technology for pure mathematical research, but that they will be important in applications of mathematics, and may well be important in developing programs capable of reading and understanding mathematical content written by humans.

Locations

  • arXiv (Cornell University)
  • Bulletin of the American Mathematical Society

Ask a Question About This Paper

Summary

This paper highlights the current state of AI technology in solving mathematical word problems (MWPs), particularly those requiring a combination of elementary knowledge and commonsense reasoning. The central argument is that despite significant advancements in AI, no existing AI system can reliably solve these types of problems.


Key innovations and findings discussed in the paper:




  1. Limitations of Current AI: The paper emphasizes that AI systems, even with recent advances, struggle with elementary commonsense word problems (CSWs). This is despite success in other AI domains.




  2. Approaches Reviewed: The paper reviews three main approaches for using Large Language Models (LLMs) to solve MWPs:



    • Outputting the answer directly.

    • Generating a computer program to solve the problem.

    • Producing a formalized representation that can be fed into an automated theorem verifier.




  3. Benchmark Analysis: It examines various benchmarks like SVAMP and LILA and experimental studies that evaluate AI systems' performance on these tasks, pointing out limitations and flaws in current technology.




  4. Artifacts in Training Data: The paper discusses the problem of "artifacts," where AI systems learn superficial regularities in the training data rather than the underlying characteristics of the problem.




  5. Untested Abilities: It notes that many benchmark tests for AI math abilities are designed for humans and assume certain basic abilities that AI systems may lack.




The main prior ingredients needed to understand this paper are:




  1. Basic understanding of AI and Machine Learning (ML): Including concepts like corpus-based machine learning, neural networks, and deep learning.




  2. Familiarity with Large Language Models (LLMs): Knowing how LLMs work, including text generation, training sets, and language modeling.




  3. Knowledge of Natural Language Processing (NLP): Understanding the challenges of processing and understanding natural language.




  4. Awareness of Automated Theorem Proving: Knowing about systems used for formally verifying mathematical proofs.




  5. General Knowledge of Mathematics: Including elementary arithmetic, solid geometry, and mathematical notation.



Similar Works

Action Title Date Authors
Mathematics, word problems, common sense, and artificial intelligence 2023-01-01 Ernest Davis
Towards Tractable Mathematical Reasoning: Challenges, Strategies, and Opportunities for Solving Math Word Problems. 2021-10-29 Keyur Faldu Amit Sheth Prashant Kikani Manas Gaur Aditi Avasthi
Towards Tractable Mathematical Reasoning: Challenges, Strategies, and Opportunities for Solving Math Word Problems 2021-01-01 Keyur Faldu Amit Sheth Prashant Kikani Manas Gaur Aditi Avasthi
Logic Contrastive Reasoning with Lightweight Large Language Model for Math Word Problems 2024-08-29 Kai Ding Ma Zhenguo Xiaoran Yan
The Gap of Semantic Parsing: A Survey on Automatic Math Word Problem Solvers 2019-04-30 Dongxiang Zhang Lei Wang Luming Zhang Bing Tian Dai Heng Tao Shen
The Gap of Semantic Parsing: A Survey on Automatic Math Word Problem Solvers 2018-01-01 Dongxiang Zhang Lei Wang Luming Zhang Bing Tian Dai Heng Tao Shen
Natural Language Premise Selection: Finding Supporting Statements for Mathematical Text 2020-01-01 Deborah Ferreira André Freitas
+
Natural Language Premise Selection: Finding Supporting Statements for Mathematical Text 2020-04-30 Deborah Ferreira André Freitas
Large Language Models for Mathematical Reasoning: Progresses and Challenges 2024-01-31 Janice Ahn Rishu Verma Renze Lou Di Liu Rui Zhang Wenpeng Yin
NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks 2022-01-01 Swaroop Mishra Arindam Mitra Neeraj Varshney Bhavdeep Sachdeva Peter Clark Chitta Baral Ashwin Kalyan
Formal Mathematical Reasoning: A New Frontier in AI 2024-12-20 Kaiyu Yang Gabriel Poesia Jingxuan He Wenda Li Kristin Lauter Swarat Chaudhuri Dawn Song
Why are NLP Models Fumbling at Elementary Math? A Survey of Deep Learning based Word Problem Solvers 2022-01-01 Sowmya S. Sundaram Sairam Gurajada Marco Fisichella P Deepak Savitha Sam Abraham
Mathematical reasoning and the computer 2025-02-11 Kevin Buzzard
+
Comprehension of mathematical word problems 1995-01-01 Melinda McKeegan
Language Modeling for Formal Mathematics 2020-06-08 Markus N. Rabe Dennis Lee Kshitij Bansal Christian Szegedy
Mathematical reasoning and the computer 2024-02-15 Kevin Buzzard
A Survey in Mathematical Language Processing 2022-01-01 Jordan Meadows André Freitas
+
Mathematical Word Problem Comprehension. 2000-05-01 Jennifer Maikos-Diegnan
Exploring the Compositional Deficiency of Large Language Models in Mathematical Reasoning 2024-05-05 Jun Zhao Jingqi Tong Yurong Mou Ming Zhang Qi Zhang Xuanjing Huang
+
Word problems and mathematical understanding 2005-06-01 Klaus Hasemann