LLMs for clinical risk prediction

Evaluating Large Language Models in Clinical Risk Prediction
A comparison of GPT-4 and Clinalytix Medical AI in predicting delirium risk

Improving Risk Prediction: The Role of LLMs in Healthcare

In the fast-evolving field of healthcare, predictive analytics are becoming increasingly vital. Accurate clinical risk prediction tools allow healthcare professionals to make informed decisions and improve patient outcomes. Large Language Models (LLMs), such as GPT-4, have emerged as powerful tools in this domain, but their suitability for high-stakes clinical applications remains debated. This study examines the effectiveness of GPT-4 and clinalytix Medical AI in predicting the risk of delirium in clinical settings, shedding light on the limitations of LLMs compared to more specialized medical AI systems.

LLMs vs. Medical AI: A Performance Gap

The study compared the ability of GPT-4 and clinalytix Medical AI to predict the likelihood of delirium in patients. While clinalytix demonstrated high precision and accuracy, GPT-4 struggled to identify positive delirium cases, missing nearly 38% of true positives. The lower recall rate for GPT-4, at 61.96%, highlights its limitations in identifying at-risk patients. Clinalytix Medical AI, on the other hand, maintained a much more reliable performance across all key metrics, including precision, recall, and specificity.

Challenges with LLMs in Clinical Contexts

The discrepancies in GPT-4’s performance stem from several factors. One key limitation is the fixed context window of 8,000 tokens, which can result in excluding critical patient data. Additionally, LLMs like GPT-4 tend to prioritize textual data over more structured information, such as laboratory results. The generative nature of LLMs also raises concerns about the accuracy of the explanations they provide, as there is no guarantee that these explanations are factually correct, leading, in turn, to potential misinterpretations in clinical practice.

Conclusion: The Need for Human Oversight

Although LLMs show great promise in healthcare, this study underscores their current limitations in clinical decision-making. While they may serve as useful tools for enhancing human expertise, LLMs are not yet capable of making accurate clinical predictions in an independent manner. Systems like clinalytix Medical AI, which are specifically designed for healthcare applications, outperform LLMs and provide more reliable support for clinicians. Continued human oversight remains essential to ensure patient safety and optimal care.

Find out more

Keywords: Clinical Risk Prediction, Large Language Models (LLMs), GPT-4, clinalytix Medical AI, Delirium Risk, Healthcare AI

Please open in latest version of Chrome, Firefox, Safari browser for best experience or update your browser.

Update Browser