top of page
  • Hongjian Zhou

Newsletter from The Neural Medwork: Issue 9



Welcome back to the 9th newsletter of The Neural Medwork! In this issue, we dive into the world of gradient boosting, a popular method for time-series data analysis with applications in biometrics, EMR and various other healthcare settings. Next up, we introduce a paper which evaluates a commercial product of HA-AKI prediction. Lastly, we present an advanced tips & trick - prompt chaining to help you leverage the analytical power of LLMs.


Core Concept: Gradient Boosting

What is Gradient Boosting?

Gradient Boosting is an extension of the decision tree model, a concept familiar to many of you. It begins with a simple decision tree to make initial predictions. Recognizing the limitations of individual trees in capturing complex relationships within data, Gradient Boosting evolves this approach by iteratively improving upon these initial models. Each new model focuses on correcting the errors made by its predecessors, creating a series of increasingly accurate predictions. This process, known as "boosting," merges these models into a powerful ensemble, minimizing errors and maximizing predictive accuracy.

A Healthcare Application: Predicting Dialysis Need

To illustrate, let's consider predicting which patients admitted to a hospital will require dialysis—a scenario that demands precise and accurate prediction tools. Initially, a simple decision tree might analyze patient data to predict dialysis need but acknowledges its error margin. Recognizing this, Gradient Boosting steps in to refine our understanding.

  1. Initial Model (Decision Tree): Evaluates patient data—such as age, underlying conditions, and blood work—to predict dialysis need. However, it may miss complex patterns that influence outcomes.

  2. Error Evaluation and Model Boosting: The algorithm identifies where the initial model went wrong and develops a new model specifically designed to correct these mistakes. This new model is more adept at recognizing the nuances missed by the first.

  3. Ensemble Creation: This process repeats, with each new model focusing on the errors of the entire ensemble before it. The collective insight from all these models is then combined into a single, comprehensive model.

  4. Final Ensemble Model: The resulting ensemble is a robust predictive tool that captures the intricate relationships within the data, far surpassing the capability of any single decision tree. This model can now predict with greater accuracy which patients will require dialysis during their hospital stay.

Gradient Boosting allows us to overcome the inherent limitations of decision trees—namely, their struggle with complex data relationships. By building upon the foundation laid by each previous model, Gradient Boosting creates an ensemble that can navigate the intricacies of patient data, offering predictions with unprecedented accuracy.


Relevant Research Paper

External Validation of a Commercial Acute Kidney Injury Predictive Model (NEJM AI)


Hospital-acquired acute kidney injury (HA-AKI) is a prevalent complication in hospitalized patients, leading to increased morbidity and mortality. Predicting HA-AKI is challenging due to its multifactorial etiology. The study focuses on evaluating the performance of a commercial machine learning model developed by Epic Systems Corporation in predicting the risk of developing HA-AKI.


The primary objective was to evaluate the external validation of Epic's Risk of HA-AKI predictive model, which is a gradient-boosted forest ensemble, in adult emergency department and hospitalized patients within a large healthcare system.


  • Model Description: Gradient-boosted forest ensemble evaluating demographics, comorbidities, medication administration, and clinical variables.

  • Model Operation: Hourly predictions were generated automatically without clinician input.

  • Study Design: Prospective external validation study, including patients with at least two serum creatinine measurements.

  • Evaluation Metrics: Encounter-level and prediction-level performance were assessed using AUROC (Area Under the Receiver Operating Characteristic curve) and AUPRC (Area Under the Precision-Recall Curve) metrics, along with net benefit and lead time warning.


  • Patient Encounters: 39,891 encounters evaluated over 5 months.

  • Incidence of HA-AKI: 24.5% for KDIGO stage 1.

  • Performance Metrics:

  • Encounter-level AUROC: 0.77 (95% CI 0.76 to 0.78)

  • Prediction-level AUROC for 48 hours: 0.76 (95% CI 0.76 to 0.76)

  • AUPRC: 0.49 (95% CI 0.48 to 0.50) for encounter-level, 0.19 (95% CI 0.19 to 0.19) for 48 hours prediction-level.

  • Specific Findings: The external validity of the model was not as high as the internal validity previously claimed by Epic, emphasizing the need for real-world testing of these tools.


The Epic Risk of HA-AKI predictive model demonstrated moderate performance in the external validation. The results underscore the importance of continuing to test these tools in real-life scenarios before widespread clinical adoption, as the external validity showed that performance was not as robust as initially claimed by the internal validation. This study highlights the critical need for external validation to ensure the reliability and effectiveness of predictive models in different clinical settings.

Dutta, Sayon, et al. "External Validation of a Commercial Acute Kidney Injury Predictive Model." NEJM AI, vol. 1, no. 3, 2024, doi:10.1056/AIoa2300099.


Tips and Tricks: Prompt Chaining

Prompt chaining emerges as a pivotal technique in enhancing the functionality of Large Language Models (LLMs), especially within the healthcare sector. By deconstructing a complex task into manageable subtasks, prompt chaining sequentially navigates through these smaller, focused prompts, utilizing the output of one as the input for the next. This methodological approach not only refines the LLM’s problem-solving capabilities but also ensures a more reliable and interpretable AI response.

What is Prompt Chaining: At its core, prompt chaining is about simplifying the LLM's workload by breaking down a broad or intricate inquiry into smaller, more digestible pieces. For healthcare professionals, this means transforming a multifaceted patient case into a series of questions that gradually build upon each other. This technique enhances the LLM's accuracy and reliability by guiding it through a structured reasoning path, similar to the step-by-step analysis a clinician might perform when diagnosing a patient or deciding on a treatment plan.

Practical Example:

Imagine you're utilizing an LLM to assess a patient's risk of developing a postoperative complication. A prompt chaining approach might start with:

Initial Prompt: "Identify the patient's primary risk factors based on their medical history."

Following Prompt: Taking the identified risk factors, "Evaluate the potential for postoperative complications specific to these risk factors."

Final Prompt: Based on the evaluation, "Suggest a comprehensive postoperative management plan tailored to minimize these risks."

Thanks for tuning in,

Sameer & Michael


bottom of page