An Evaluation of Sakana's AI Scientist for Autonomous Research: Wishful Thinking or an Emerging Reality Towards 'Artificial General Research Intelligence' (AGRI)?

Type: Preprint
Publication Date: 2025-02-20
Citations: 0
DOI: https://doi.org/10.48550/arxiv.2502.14297

Abstract

A major step toward Artificial General Intelligence (AGI) and Super Intelligence is AI's ability to autonomously conduct research - what we term Artificial General Research Intelligence (AGRI). If machines could generate hypotheses, conduct experiments, and write research papers without human intervention, it would transform science. Recently, Sakana.ai introduced the AI Scientist, a system claiming to automate the research lifecycle, generating both excitement and skepticism. We evaluated the AI Scientist and found it a milestone in AI-driven research. While it streamlines some aspects, it falls short of expectations. Literature reviews are weak, nearly half the experiments failed, and manuscripts sometimes contain hallucinated results. Most notably, users must provide an experimental pipeline, limiting the AI Scientist's autonomy in research design and execution. Despite its limitations, the AI Scientist advances research automation. Many reviewers or instructors who assess work superficially may not recognize its output as AI-generated. The system produces research papers with minimal human effort and low cost. Our analysis suggests a paper costs a few USD with a few hours of human involvement, making it significantly faster than human researchers. Compared to AI capabilities from a few years ago, this marks progress toward AGRI. The rise of AI-driven research systems requires urgent discussion within Information Retrieval (IR) and broader scientific communities. Enhancing literature retrieval, citation validation, and evaluation benchmarks could improve AI-generated research reliability. We propose concrete steps, including AGRI-specific benchmarks, refined peer review, and standardized attribution frameworks. Whether AGRI becomes a stepping stone to AGI depends on how the academic and AI communities shape its development.

Locations

  • arXiv (Cornell University)

Ask a Question About This Paper

Summary

This paper presents an in-depth evaluation of Sakana.ai’s AI Scientist, a system designed to automate the full research lifecycle. The significance of this work stems from the increasing interest and investment in Artificial Research Intelligence (ARI), which many see as a stepping stone towards Artificial General Intelligence (AGI) and eventually Super Intelligence.

The key innovation examined in this paper is the AI Scientist’s claim of autonomously conducting research, from generating ideas to writing and reviewing papers.

However, the evaluation reveals significant shortcomings. The AI Scientist’s literature review process is shallow, relying on keyword searches rather than synthesis, leading to inaccurate novelty assessments. The system demonstrates a lack of robustness in experiment execution with many failing due to coding errors. Moreover, the experiments that did run often produced logically flawed results. Additionally, the manuscripts generated by the AI Scientist were poorly substantiated with outdated citations, structural errors, and hallucinated numerical results.

Despite these limitations, the paper acknowledges the AI Scientist as a significant advancement in research automation. It can produce complete research manuscripts with minimal human intervention at remarkable speed and cost efficiency. This highlights the rapid progress in AI’s ability to mimic academic writing and structure arguments.

The paper underscores the need for the Information Retrieval (IR) and scientific communities to engage in discussions about the development and governance of ARI. It proposes concrete actions such as pilot projects, competitions, and standardized attribution frameworks to guide the evolution of AI in research.

Prior ingredients needed to understand this paper include knowledge of:

  1. Large Language Models (LLMs): Familiarity with how LLMs are used for text generation and their potential in research tasks.
  2. Artificial General Intelligence (AGI): Understanding the concept of AGI and the role of ARI as a potential precursor.
  3. Research Processes: Basic knowledge of the research lifecycle, including literature review, experiment design, data analysis, and manuscript writing.
  4. Machine Learning Concepts: Understanding of concepts like stochastic gradient descent (SGD), cross-validation, and recommendation algorithms.
  5. Information Retrieval (IR): Familiarity with IR tasks like literature retrieval, citation analysis, and relevance ranking.

Similar Works

Action Title Date Authors
AI empowering research: 10 ways how science can benefit from AI 2023-01-01 César França
Publishing fast and slow: A path toward generalizability in psychology and AI 2022-01-01 Andrew K. Lampinen Stephanie Chan Adam Santoro Felix Hill
AI Expands Scientists' Impact but Contracts Science's Focus 2024-12-10 Qun Hao Fengli Xu Yong Li James Evans
Responsible AI: Portraits with Intelligent Bibliometrics 2024-05-05 Yi Zhang Mengjia Wu Guangquan Zhang Jie LĂŒ
Autonomous LLM-driven research from data to human-verifiable research papers 2024-04-24 Tal Ifargan Lukas Hafner M. L. Kern Ori Alcalay Roy Kishony
Quantifying the Benefit of Artificial Intelligence for Scientific Research 2023-01-01 Jian Gao Dashun Wang
The complementary contributions of academia and industry to AI research 2024-01-01 Lizhen Liang Han Zhuang James Zou Daniel E. Acuña
+
Publishing fast and slow: A path toward generalizability in psychology and AI 2021-03-23 Andrew K. Lampinen Stephanie C. Y. Chan Adam Santoro Felix Hill
The Great AI Witch Hunt: Reviewers Perception and (Mis)Conception of Generative AI in Research Writing 2024-06-26 Hilda Hadan D. Wang Reza Hadi Mogavi Joseph Tu Leah Zhang-Kennedy Lennart E. Nacke
Investigating Responsible AI for Scientific Research: An Empirical Study 2023-01-01 Muneera Bano Didar Zowghi Pip Shea Georgina Ibarra
AI Knowledge and Reasoning: Emulating Expert Creativity in Scientific Research 2024-04-05 Anirban Mukherjee Hannah Hanwen Chang
The Impact of Responsible AI Research on Innovation and Development 2024-07-22 Ali Akbar Septiandri Marios Constantinides Daniele Quercia
The Unreasonable Effectiveness of Open Science in AI: A Replication Study 2024-12-20 Odd Erik Gundersen Odd Cappelen Martin MÞlnÄ Nicklas Grimstad Nilsen
Identifying the Development and Application of Artificial Intelligence in Scientific Text 2020-01-01 James Dunham Jennifer Melot Dewey Murdick
AIGS: Generating Science from AI-Powered Automated Falsification 2024-11-17 Zijun Liu Kaiming Liu Yong‐Guan Zhu Xuanyu Lei Zonghan Yang Zhenhe Zhang Peng Li Yang Liu
Generative AI Uses and Risks for Knowledge Workers in a Science Organization 2025-01-27 Kelly B. Wagman Matthew T. Dearing Marshini Chetty
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape 2023-01-01 Timothy R. McIntosh Teo SuĆĄnjak Tong Liu Paul Watters Malka N. Halgamuge
CycleResearcher: Improving Automated Research via Automated Review 2024-10-28 Yixuan Weng Minjun Zhu Guangsheng Bao Hongbo Zhang Jindong Wang Yue Zhang Linyi Yang
Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation 2025-02-07 Steffen Eger Cao Yong Jennifer D’Souza Andreas Geiger Christian Greisinger Stephanie Groß Yufang Hou Brigitte Krenn Anne Lauscher Yizhi Li
AI Usage Cards: Responsibly Reporting AI-Generated Content 2023-06-01 Jan Philip Wahle Terry Ruas Saif M. Mohammad Norman Meuschke Béla Gipp

Cited by (0)

Action Title Date Authors

Citing (0)

Action Title Date Authors