Researchers used locally run LLMs to interpret chest x-ray findings

Advantages of using local LLMs

One of the advantages of using locally run LLMs is the ability to incorporate datasets available from the NIH, for a privacy-preserving scenario, where confidential data was not being exited to a company. Using cloud-based models, like chatGPT, is not acceptable for analysis of protected health information about patients, and that’s a major obstacle as we try to personalize the applications of LLMs to individual patients, while maintaining a protected health network.

“Patient data can’t be uploaded to a company and that company can’t use that data to improve its AI model”.

One of the advantages of using models like VICUNA is that it does not need to be trained from scratch. VICUNA originally came from a parent LLM called LLAMA, which was made open source by Meta, to allow local execution of LLMs. It gives you the opportunity to run PHIs through it, without risk of security breach. One requirement to run these machines, however, is high computational power. At the NIH, they’re using BioWulf, a high-performance computing cluster, for computation. Biowulf currently does not allow PHI inputs, so during the data processing stage, they stripped out all PHIs. However, the theorized concept of being able to run PHIs with local LLMs without risking security breach is still preserved.

Key findings of leveraging local LLMs for radiology reports

Classically, there’s a set of 13 findings which are very commonly found in chest x-rays which have commonly been used in a lot of papers to assess Natural Language Processing (NLP)-based tools. NLPs will label the 13 findings as ‘present’, ‘not present’, ‘not mentioned’, or ‘not sure whether it’s present or not’. These output labels have been used to train a lot of image-based models.

The NIH team took chest X-ray reports, and ran it through the local VICUNA model, and found that despite models not being fine-tuned to the specific needs of reading radiology reports, they did really well in labeling the radiology reports.

“Vicuna was a general purpose model. It wasn’t trained specifically for this task, but comparing it to state of the art models, there was pretty good agreement between them.”

The not fine-tuned models were on par with the state of the art models in deciphering chest x-ray reports.

What are its implications in the clinical context?

Some may be wondering, for COVID, Pneumothorax findings on x-ray reports, how well are these models equipped to tackle other diagnoses? Well, LLMs like Vicuna come pre-trained. You don’t have to re-train them for each particular task. With just fine-tuning of the “prompts” that the team gave to the LLM, they were able to get results like the state of the art models.

The implications of these are the possibility of in-house systems where you can harness the power of LLMs to read through reports, allowing for massive administrative burden relief, while simultaneously protecting patient information.

The NIH team revealed that there have been discussions on doing similar projects at a larger scale.

Where are we headed with AI?

“Where are we with AI nowadays? are we at the peak of the hype cycle? are we realizing limitations?”

LLMs are a major step in analyzing text: from clinical notes to written radiology reports.

“There’s always a lot of hype around any new technology, but there’s a core value in these LLMs that I think will survive the hype cycle”