Abandoning hypothesis-driven research? The future of LLM-augmented research

Part 1: Abandoning hypothesis-driven research

In 2023, I came across 2 AI initiatives at the NIH that stood out to me: Bridge2AI and AIM-AHEAD. Led by Dr. Michael Chiang, Director of the NEI, the programs’ goal was to “abandon the traditional hypothesis-driven research approach to create data resources to address [gaps in data heterogeneity and lack of data collection standardization].” In other words, to create retrospective databases at a higher standard for AI-driven discoveries.

Though many consider retrospective analyses to be sub-standard, I wholeheartedly agreed with Dr. Chiang’s mission. The benefits are clear: at the sacrifice of strength-of-evidence, we enable automated data-analysis pipelines that allow for research at scales and speeds impossible to be replicated by prospective studies.

[!An analogy]
Retrospective studies would effectively scan for ‘hits’, where getting a ‘hit’ warrants deeper evaluation via prospective studies.

Although automated data-analysis pipelines provided the benefit of research at scale and speed, there are limitations. In most clinical research groups, the bottleneck is in the ‘writing’, not data analysis.

Part 2: The future of LLM-augmented research

With the continued improvement in accuracy and reliability for LLMs, we are starting to see researchers experiment with automated research pipelines.

August 2024: The AI Scientist (SakanaAI) was the first paper to demonstrate AI agents writing an entire manuscript—from start to finish—for just $15.
January 2025: Agent Laboratory (Johns Hopkins) was the second paper to demonstrate fully autonomous manuscript generation, at a cost of only $3. The code was released as open source.
February 2025: Deep Research (OpenAI) introduced OpenAI’s new “deep research” feature.

Papers 1 and 2 focused specifically on the field of computer science. They successfully demonstrated that LLMs can generate novel ideas, perform keyword searches, and identify related literature autonomously.

Currently, there are no published studies demonstrating similar novel research automation capabilities in healthcare.

However, there is one notable contributor in the field of automated systematic reviews:

Dr. Irbaz Bin Riaz, an oncologist at the Mayo Clinic.

I believe he pioneered the creation of a new type of manuscript: Living Systematic Reviews (LSR) and Network Meta Analysis (NMA). LSR-NMAs are ‘living’ in the sense that he and his colleagues created a NLP pipeline (pre-LLMs) which I believe works as such:

– After his LSR-NMA is published, the program monitors the internet for new manuscripts relevant to the topic.

– The manuscripts are automatically stored for human review

– Every 1-2 years, the original LSR-NMA team reviews the flagged manuscripts and updates the publication, modifying figures, tables, and discussions as needed.

He had already established this workflow prior to the development of LLMs. With the advent of LLMs, however, the potential for automation and scalability has grown significantly more.

In a recent email correspondence with Dr. Riaz, he mentioned that he’s building an ‘end to end multi-agent collaborative Human-AI system’ for Systematic Reviews. I’m very excited to see his progress.

When fully-autonomous manuscript Agents arrive, on top of the existing computer capabilities that enable accelerated data analysis, manuscript Agents will enable accelerated information distribution.

We may soon see a research landscape where time from data analysis to publication is cut 10x, or 100x.