Artificial intelligence (AI) is widely recognized for its ability to analyze and synthesize information from a multitude of materials both literary and visual. While its creation potential is vast, AI-generated content has caused humans frustration as it often contains oddities and inaccuracies (which in some cases are stuff of comedy). Who hasn’t chuckled at an image with too many hands or a face that is vaguely human but unsettlingly wrong?
Now it seems that AI’s predictive capabilities are increasing and perhaps surpassing human intuition, at least in predicting the outcomes of neuroscience research. That was the conclusion of a study published last week in Nature Human Behavior in which Ken Luo, PhD, and his team at University College London (UCL), investigated the predictive capabilities of large language models (LLMs). Their report is entitled, “Large language models surpass human experts in predicting neuroscience results.”
The UCL team developed a tool called BrainBench to evaluate the ability of LLMs to predict neuroscience results. The authors presented BrainBench with pairs of neuroscience abstracts that included background, methods and results. In each pair, one abstract was real, while the other had plausible–but ultimately false–results. BrainBench assessed 15 LLMs and 171 human neuroscientists that were tasked to identify the correct abstract in each pair.
“Since the advent of generative AI like ChatGPT, much research has focused on LLMs’ question-answering capabilities, showcasing their remarkable skill in summarizing knowledge from extensive training data. However, rather than emphasizing their backward-looking ability to retrieve past information, we explored whether LLMs could synthesize knowledge to predict future outcomes,” Luo said.
“Our work investigates whether LLMs can identify patterns across vast scientific texts and forecast outcomes of experiments,” Luo explained. “Scientific progress often relies on trial and error, but each meticulous experiment demands time and resources. Even the most skilled researchers may overlook critical insights from the literature.”
In the contest between AI and humans, AI was the winner. LLMs averaged 81% accuracy compared with 63% for human experts. When the human group was restricted to the highest degree of self-reported expertise by domain, the accuracy increased to only 66%. The LLMs reported more confidence in their decisions, which were more likely to be correct than the human participants in the study.
“What is remarkable is how well LLMs can predict the neuroscience literature. This success suggests that a great deal of science is not truly novel but conforms to existing patterns of results in the literature. We wonder whether scientists are being sufficiently innovative and exploratory,” commented the study’s senior author, Bradley Love, PhD, professor at UCL.
The researchers adapted and trained a version of an open-source LLM, Mistral, on neuroscience literature, named BrainGPT. When presented with the same testing as BrainBench, BrainGPT predicted correct abstract results with 86% accuracy, compared to 83% accuracy in the untrained version of Mistral.
The future of research and, in fact, daily life for humanity seems to involve an integration of AI. This study shows the utility of AI in predicting study outcomes, but it is not infallible and should be used as a tool to assist researchers, not a replacement for human researchers.
“We envision a future where researchers can input their proposed experiment designs and anticipated findings, with AI offering predictions on the likelihood of various outcomes,” Luo said. “This would enable faster iteration and more informed decision-making in experiment design.”