A major step toward Artificial General Intelligence (AGI) and Super Intelligence is AI's ability to autonomously conduct research - what we term Artificial General Research Intelligence (AGRI). If machines could generate hypotheses, conduct experiments, and write research papers without human intervention, it would transform science. Recently, Sakana.ai introduced the AI Scientist, a system claiming to automate the research lifecycle, generating both excitement and skepticism. We evaluated the AI Scientist and found it a milestone in AI-driven research. While it streamlines some aspects, it falls short of expectations. Literature reviews are weak, nearly half the experiments failed, and manuscripts sometimes contain hallucinated results. Most notably, users must provide an experimental pipeline, limiting the AI Scientist's autonomy in research design and execution. Despite its limitations, the AI Scientist advances research automation. Many reviewers or instructors who assess work superficially may not recognize its output as AI-generated. The system produces research papers with minimal human effort and low cost. Our analysis suggests a paper costs a few USD with a few hours of human involvement, making it significantly faster than human researchers. Compared to AI capabilities from a few years ago, this marks progress toward AGRI. The rise of AI-driven research systems requires urgent discussion within Information Retrieval (IR) and broader scientific communities. Enhancing literature retrieval, citation validation, and evaluation benchmarks could improve AI-generated research reliability. We propose concrete steps, including AGRI-specific benchmarks, refined peer review, and standardized attribution frameworks. Whether AGRI becomes a stepping stone to AGI depends on how the academic and AI communities shape its development.
This paper presents an in-depth evaluation of Sakana.aiâs AI Scientist, a system designed to automate the full research lifecycle. The significance of this work stems from the increasing interest and investment in Artificial Research Intelligence (ARI), which many see as a stepping stone towards Artificial General Intelligence (AGI) and eventually Super Intelligence.
The key innovation examined in this paper is the AI Scientistâs claim of autonomously conducting research, from generating ideas to writing and reviewing papers.
However, the evaluation reveals significant shortcomings. The AI Scientistâs literature review process is shallow, relying on keyword searches rather than synthesis, leading to inaccurate novelty assessments. The system demonstrates a lack of robustness in experiment execution with many failing due to coding errors. Moreover, the experiments that did run often produced logically flawed results. Additionally, the manuscripts generated by the AI Scientist were poorly substantiated with outdated citations, structural errors, and hallucinated numerical results.
Despite these limitations, the paper acknowledges the AI Scientist as a significant advancement in research automation. It can produce complete research manuscripts with minimal human intervention at remarkable speed and cost efficiency. This highlights the rapid progress in AIâs ability to mimic academic writing and structure arguments.
The paper underscores the need for the Information Retrieval (IR) and scientific communities to engage in discussions about the development and governance of ARI. It proposes concrete actions such as pilot projects, competitions, and standardized attribution frameworks to guide the evolution of AI in research.
Prior ingredients needed to understand this paper include knowledge of:
Action | Title | Date | Authors |
---|
Action | Title | Date | Authors |
---|