Newsletter from The Neural Medwork: Issue 17

Hongjian Zhou
Jul 3, 2024
5 min read

Abstract:

Welcome back to the 17th edition of The Neural Medwork! We hope you enjoyed our fun exploration on how to use ChatGPT during your travels in the last edition. Today, we continue our deep dive into Natural Language Processing (NLP), focusing on the fascinating concept of Named Entity Recognition (NER). Next, we present the research GPT-4 for Information Retrieval and Comparison of Medical Oncology Guidelines. Lastly, we introduce a tip&trick - ReAct which combines reasoning traces with task-specific actions for LLM generation.

Core Concept: Named Entity Recognition (NER)

Named Entity Recognition (NER) is a crucial NLP technique that identifies and classifies key information (entities) in text into predefined categories such as names of people, organizations, locations, expressions of times, quantities, monetary values, percentages, and more. In the context of healthcare, NER can be particularly powerful, helping to extract critical medical terms and patient information from unstructured text data, such as clinical notes or research articles.

NER operates similarly to how a healthcare provider might sift through a patient's medical records to find pertinent information. Imagine you're reviewing a patient's history to find mentions of specific medications, diseases, or procedures. NER performs this task automatically and at scale.

Here's a step-by-step analogy to help illustrate NER:

Input (Reading Medical Records): Just like reading through a stack of medical records, NER processes the text data. This could be any unstructured data source, such as doctor’s notes, discharge summaries, or research papers.
Tokenization (Breaking Down Text): NER breaks down the text into smaller parts (tokens), much like how you might underline or highlight important sections in a document.
Tagging Entities (Identifying Key Information): The tool then tags these tokens with labels that indicate their role. For example, it identifies "aspirin" as a medication, "2023" as a date, and "pneumonia" as a disease. This is akin to a clinician marking out key terms in a patient’s history.
Classification (Categorizing Information): Finally, these tagged entities are classified into predefined categories. In healthcare, these categories might include medications, dosages, symptoms, diagnoses, and more.

Several tools and models have been developed to enhance the effectiveness of NER in the medical field. These include:

BioBERT: A version of BERT (Bidirectional Encoder Representations from Transformers) pre-trained on large-scale biomedical corpora, enhancing its ability to understand and tag medical entities.
ClinicalBERT: A BERT model specifically trained on clinical notes, making it particularly adept at recognizing medical terms in electronic health records

Most medical data is unstructured, consisting of clinical notes, research articles, and patient histories. This unstructured nature makes it challenging to extract, summarize, and utilize critical information efficiently. NER stands as a cornerstone of AI applications in healthcare, enabling the transformation of unstructured text into structured, actionable insights. By harnessing the information from unstructured data, NER facilitates better summarization and utilization of medical information, enhancing clinical decision-making and patient care.

Relevant Research Paper: GPT-4 for Information Retrieval and Comparison of Medical Oncology Guidelines

Purpose

The study aimed to evaluate the use of GPT-4, a large language model, in interpreting and comparing oncology guidelines from the American Society of Clinical Oncology (ASCO) and the European Society for Medical Oncology (ESMO). The focus was on assessing GPT-4's ability to answer clinically relevant questions regarding the management of patients with pancreatic cancer, metastatic colorectal cancer, and hepatocellular carcinoma.

Methodology

The researchers compared GPT-4's performance in answering oncology-related questions with and without the use of Retrieval-Augmented Generation (RAG). RAG enhances the model's knowledge by incorporating relevant external information. The process involved:

Data Preparation: Oncology guidelines were extracted from ASCO and ESMO documents. Text from these guidelines was split into manageable chunks and converted into numeric vectors using OpenAI’s embedding model.
Model Testing: GPT-4 was tested on 30 clinically relevant questions across three types of cancer. The questions were designed to cover various aspects of treatment and management. The model's responses were generated with and without RAG.
Evaluation: The responses were manually evaluated by trained clinicians for accuracy. The evaluation criteria included faithfulness (alignment with original guidelines) and relevance. Correct, inaccurate, and wrong categorizations were assigned based on the comparison with guideline documents.

Main Results

Performance with RAG: GPT-4 with RAG provided correct responses in 84% of cases (184 out of 218 statements), demonstrating a significant improvement in accuracy. Only 30 responses were found to be inaccurate, and 4 were wrong.
Performance without RAG: GPT-4 without RAG achieved a lower accuracy rate of 57% (93 out of 163 statements). In this setting, 29 responses were inaccurate, and 41 were wrong.

Conclusion

The study concluded that GPT-4, when enhanced with RAG, significantly improves the accuracy of responses to oncology-related questions by integrating up-to-date external information. This enhancement can aid oncologists in making more informed clinical decisions and adhering to the most recent guidelines. The use of RAG reduces model hallucinations and increases the reliability of LLMs in clinical settings. However, challenges remain, particularly in multi-document comparisons and handling conflicting statements within guidelines.

Ferber, D., Wiest, I. C., Wölflein, G., Ebert, M. P., Beutel, G., Eckardt, J. N., Truhn, D., Springfeld, C., Jäger, D., & Kather, J. N. (2024). GPT-4 for Information Retrieval and Comparison of Medical Oncology Guidelines. NEJM AI, 1(6). https://doi.org/10.1056/AIcs2300235

Tips and Tricks: Enhancing AI Decision-Making with ReAct in Healthcare

The ReAct framework, introduced by Yao et al. (2022), interleaves reasoning traces with task-specific actions, enabling the AI to not only generate detailed reasoning paths but also to interact with external tools and knowledge bases to gather additional information.

What is ReAct: ReAct combines the strengths of reasoning and action to create a more dynamic and reliable AI. By generating reasoning traces, the LLM can track and update its action plans, handle exceptions, and ensure a logical progression of thought. The action steps allow the model to access external information, such as medical databases or patient records, enhancing the accuracy and factual consistency of its responses. This framework significantly improves the interpretability and trustworthiness of AI outputs, making it an invaluable tool for healthcare professionals.

Practical Example:

Consider a scenario where a clinician needs to develop a comprehensive treatment plan for a patient with a rare disease. Using the ReAct framework, the process would involve:

Initial Reasoning Trace: The LLM generates a reasoning path by analyzing the patient's symptoms, history, and preliminary test results.

Action Step: The AI interfaces with medical databases to retrieve the latest research and treatment guidelines for rare diseases.

Updated Reasoning Trace: The model integrates the new information and adjusts the treatment plan accordingly, addressing any exceptions or unique patient factors.

For instance, the prompt could be: "Generate a treatment plan for a patient diagnosed with Amyloidosis, considering recent advances in treatment protocols." The ReAct framework enables the AI to pull relevant data from recent studies and synthesize it with the patient's specific conditions, resulting in a well-informed, evidence-based treatment strategy.

Thanks for tuning in,

Sameer & Michael