Scientific discovery is one of the most sophisticated human activities. First, scientists must understand existing knowledge and identify important gaps.
Next, you need to formulate a research question and design and conduct experiments to pursue the answer.
Next, the results of the experiment need to be analyzed and interpreted, which may give rise to further research questions.
Can such a complex process be automated? Last week, Sakana AI Labs announced the creation of “AI Scientist,” an artificial intelligence system that the company claims can make scientific discoveries in the field of machine learning in a fully automated way.
Using generative large-scale language models (LLMs), like those behind ChatGPT and other AI chatbots, the system can brainstorm, select promising ideas, code new algorithms, plot the results, and write a paper summarizing the experiments and their findings, along with a bibliography.
Sakana claims that AI tools can run the entire lifecycle of a scientific experiment for a cost of just $15 per paper — less than a scientist's lunch money.
These are some pretty bold claims. Are they true? And even if they are, is an army of AI scientists churning out research papers at inhuman speed really good news for science?
How Computers Do “Science”
Much of science is public, nearly all scientific knowledge is written down somewhere (otherwise there would be no way to “know” it), and millions of scientific papers are freely available online in repositories such as arXiv and PubMed.
Trained on this data, the LLM captures the language of science and its patterns, so it's probably not at all surprising that the generative LLM can produce what looks like a good scientific paper: it has plenty of examples to copy.
Whether an AI system can produce interesting scientific papers is less clear: the point is that good science requires novelty.
But is it interesting?
Scientists do not want to be taught about what is already known, but rather want to learn new things, especially new things that are significantly different from what is already known, and this requires judgment about the scope and value of their contributions.
The Sakana system tries to address curiosity in two ways: First, it “scores” the similarity of new paper ideas to existing research (indexed in the Semantic Scholar repository); anything that is too similar is discarded.
Second, Sakana's system introduces a “peer review” step, where another LLM is used to judge the quality and novelty of the paper produced. Again, online sites such as openreview.net have many examples of peer reviews to guide how to critique a paper, and LLMs also incorporate these.
AI may not be able to correctly judge the output of AI
Feedback on Sakana AI's work has been mixed, with some describing it as producing “endless scientific garbage.”
A review of the system's own output also judges the papers to be weak at best, and while this will likely improve as technology improves, the question remains as to whether automated scientific writing is worthwhile.
The ability of JLPTs to judge research quality is also an open question: my own research (forthcoming in Research Synthesis Methods ) suggests that JLPTs are not very good at judging the risk of bias in medical research, although this too may improve over time.
Sakana's system automates discovery in computational research, which is much easier than other types of science that require physical experiments. Sakana's experiments are done using code, which is also structured text that LLMs can be trained to generate.
AI tools should support scientists, not replace them
AI researchers have been developing systems to aid in science for decades, and given the sheer volume of published research, even finding publications relevant to a particular scientific question can be a challenge.
Specialized search tools are leveraging AI to help scientists find and synthesize existing research, including the aforementioned Semantic Scholar, but also newer systems such as Elicit, Research Rabbit, scite, and Consensus.
Text mining tools such as PubTator dig deep into papers to identify key focal points, such as specific gene mutations or diseases and their established relationships, which is particularly useful for curating and organizing scientific information.
Machine learning is also being used to support the synthesis and analysis of medical evidence in tools such as Robot Reviewer, and Scholarcy's compare and contrast summaries of paper claims can help you conduct a literature review.
All these tools are not meant to replace scientists, but to help them do their jobs more efficiently.
AI research could exacerbate existing problems
While Sakana AI says it doesn't see the role of human scientists diminishing, its vision of a “fully AI-driven scientific ecosystem” will have a major impact on science.
The concern is that as AI-generated papers flood the scientific literature, future AI systems may be trained on AI output, leading to broken models – meaning that AI systems may become increasingly ineffective at innovating.
But the impact on science goes far beyond the impact on AI science systems themselves.
The scientific community already has bad actors, such as “paper mills” that mass-produce fake papers, and the ability to write a scientific paper for $15 and a vague initial prompt will only exacerbate this problem.
The need to check a sea of automatically generated research for errors could quickly overwhelm the capacity of actual scientists. The peer review system is already dysfunctional, and feeding more research of questionable quality into the system will not solve the problem.
Science is fundamentally based on trust: scientists value the integrity of the scientific process so that they can be confident that our understanding of the world (and now the world's machine) is valid and improving.
A scientific ecosystem in which AI systems play a key role raises fundamental questions about the meaning and value of this process, and how much trust should be placed in AI scientists: is this the scientific ecosystem we want?
Karin Verspoor, Head of the School of Computing Technology, RMIT University
This article is republished from The Conversation under a Creative Commons license. Read the original article.