Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision

Zhiheng Xi, Dingwen Yang, Jinyan Huang, Jiafu Tang, Guanyu Li, Yiwen Ding, Wei He, Boyang Hong, S. Do, Wenyu Zhan

Type: Preprint

Publication Date: 2024-11-25

Citations: 0

DOI: https://doi.org/10.48550/arxiv.2411.16579

Abstract

Training large language models (LLMs) to spend more time thinking and reflection before responding is crucial for effectively solving complex reasoning tasks in fields such as science, coding, and mathematics. However, the effectiveness of mechanisms like self-reflection and self-correction depends on the model's capacity to accurately assess its own performance, which can be limited by factors such as initial accuracy, question difficulty, and the lack of external feedback. In this paper, we delve into a two-player paradigm that separates the roles of reasoning and critique models, where the critique model provides step-level feedback to supervise the reasoning (actor) model during both test-time and train-time. We first propose AutoMathCritique, an automated and scalable framework for collecting critique data, resulting in a dataset of $76,321$ responses paired with step-level feedback. Fine-tuning language models with this dataset enables them to generate natural language feedback for mathematical reasoning. We demonstrate that the critique models consistently improve the actor's performance on difficult queries at test-time, especially when scaling up inference-time computation. Motivated by these findings, we introduce the critique-based supervision to the actor's self-training process, and propose a critique-in-the-loop self-improvement method. Experiments show that the method improves the actor's exploration efficiency and solution diversity, especially on challenging queries, leading to a stronger reasoning model. Lastly, we take the preliminary step to explore training self-talk reasoning models via critique supervision and showcase its potential. Our code and datasets are at \href{https://mathcritique.github.io/}{https://mathcritique.github.io/}.

Locations

arXiv (Cornell University) - View - PDF

Similar Works

Action	Title	Year	Authors
+ PDF Chat	CriticBench: Benchmarking LLMs for Critique-Correct Reasoning	2024	Zicheng Lin Zhibin Gou Liang Tian Ruilin Luo Haowei Liu Yujiu Yang
+ PDF Chat	Small Language Models Need Strong Verifiers to Self-Correct Reasoning	2024	Yunxiang Zhang Muhammad Khalifa Lajanugen Logeswaran Jaekyeom Kim Moontae Lee Honglak Lee Lu Wang
+ PDF Chat	Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning	2024	Zhihan Zhang Zhenwen Liang Wenhao Yu Dian Yu Mengzhao Jia Dong Yu Meng Jiang
+ PDF Chat	RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques	2025	Zhimin Tang Zhao Li Zuo Xiao Tian Ding Ruoyu Sun Benyou Wang Dayiheng Liu Fei Huang Tianyu Liu Bowen Yu
+ PDF Chat	Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback	2025	Yen-Ting Lin Di Jin Tengyu Xu Tianhao Wu Sainbayar Sukhbaatar Zhu Chen Yun Feng He Yun-Nung Chen Jason Weston Yuandong Tian
+	Critique Ability of Large Language Models	2023	Liangchen Luo Lin Zi Yinxiao Liu Lei Shu Yun Zhu Jingbo Shang Lei Meng
+ PDF Chat	Optimizing Language Model's Reasoning Abilities with Weak Supervision	2024	Yongqi Tong Sizhe Wang Dawei Li Y.H. Wang Simeng Han Lin Zi Chengsong Huang Jiaxin Huang Jingbo Shang
+	Large Language Models Can Self-Improve	2022	Jiaxin Huang Shixiang Gu Le Hou Yuexin Wu Xuezhi Wang Hongkun Yu Jiawei Han
+ PDF Chat	Large Language Models Can Self-Improve	2023	Jiaxin Huang Shixiang Gu Le Hou Yuexin Wu Xuezhi Wang Hongkun Yu Jiawei Han
+ PDF Chat	Large Language Models Can Self-Correct with Minimal Effort	2024	Zhenyu Wu Qingkai Zeng Zhihan Zhang Zhaoxuan Tan Chao Shen Meng Jiang
+ PDF Chat	Evaluating Mathematical Reasoning Beyond Accuracy	2024	Shijie Xia Xuefeng Li Yixin Liu Tongshuang Wu Pengfei Liu
+	RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought	2023	Tianci Xue Ziqi Wang Zhenhailong Wang Chi Han Pengfei Yu Heng Ji
+ PDF Chat	Learning from Committee: Reasoning Distillation from a Mixture of Teachers with Peer-Review	2024	Zhuochun Li Yuelyu Ji Rui Meng Daqing He
+ PDF Chat	S$^3$c-Math: Spontaneous Step-level Self-correction Makes Large Language Models Better Mathematical Reasoners	2024	Yuchen Yan Jin Jiang Yang Liu Yixin Cao Xin Xu Mengdi zhang Xunliang Cai Jian Q. Shao
+	Learning From Mistakes Makes LLM Better Reasoner	2023	Shengnan An Zexiong Ma Zeqi Lin Nanning Zheng Jian–Guang Lou Weizhu Chen
+ PDF Chat	Beyond Imitation: Learning Key Reasoning Steps from Dual Chain-of-Thoughts in Reasoning Distillation	2024	Chengwei Dai Kun Li Wei Zhou Songlin Hu
+	CHAMP: A Competition-level Dataset for Fine-Grained Analyses of LLMs' Mathematical Reasoning Capabilities	2024	Yujun Mao Yoon Kim Yilun Zhou
+ PDF Chat	Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying	2024	Federico Castagna Isabel Sassoon Simon Parsons
+ PDF Chat	LLMs Can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought	2024	Zhuoxuan Jiang Haoyuan Peng Shanshan Feng Fan Li Dongsheng Li
+ PDF Chat	LLMs can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought	2024	Zhuoxuan Jiang Haoyuan Peng Shanshan Feng Fan Li Dongsheng Li

Works That Cite This (0)

Action	Title	Year	Authors

Works Cited by This (0)

Action	Title	Year	Authors