Following up on our previous episode on how to build your custom GPT, we've created an easy-to-follow tutorial on evaluating your clinical GPT – with a proven framework! The framework, TEHAI (Translational Evaluation of Healthcare AI), is shared in this post with everyone!
🎥 Episode Breakdown:
0:00 - 1:40: Introducing the TEHAI framework
1:40 - 7:00: Capability evaluation
7:00 - 13:00: Utility evaluation
13:00 - 16:35: Adoption evaluation
Evaluation Framework: TEHAI Assessment for GPT in Clinical Practice
This is an easy-to-follow framework to evaluate your clinical GPT. Under each section, we provide sample questions and prompts that you can use to show how to practically apply this framework. Our example questions relate to the Pneumonia GPT we created in the first episode.
1. Capability
Intrinsic Capability: Evaluate the model's ability to enhance decision-making, reliability, and knowledge provision
Performance Metrics & Use Case: Has the GPT been tested against any relevant performance metrics?
2. Utility
Generalizability and Contextualization: Is the output of the GPT accurate in various clinical contexts.
Safety and Quality: Is there ongoing monitoring and governance of the GPT?
Transparency: Can the GPT explain how it produces an answer.
Privacy: Always ensure no personal health information is shared with the GPT
Time Efficiency: Does using the GPT save time in your clinical practice? How does it compare to what you are using right now?
3. Adoption
Clinical Context Use: Assess the practicality of utilizing the GPT in your clinical setting. Does it integrate within your current clinical workflow or do you need to change how you practice?
Technical Integration and Operational Dependability: Can the GPT be integrated within your EMR? Do you have access to ChatGPT in your clinical setting?
Alignment with Domain: Ensure that the GPT aligns with healthcare regulations and standards.
Assessing GPT Summary:
Step 1: Identify a specific GPT tool you are considering for clinical practice.
Step 2: Allocate 10-15 minutes for a thorough yet rapid assessment.
Step 3: Go through each component of the TEHAI framework, focusing on the key areas outlined above
Step 4: Score each component based on how well the GPT meets the requirements (e.g., High, Medium, Low). For more technical scoring system see reference on TEHAI framework below.
Step 5: Make an informed decision based on your aggregate assessment and your clinical needs.
(Reddy S. Evaluating large language models for use in healthcare: A framework for translational value assessment. Informatics in Medicine Unlocked. 2023)
Techniques for improvement
Narrow down your use case for each GPT
Provide as much context as possible: Organize your prompt in a structured way and be specific with what output you are looking for.
Iterate on your prompts to see how the GPT reacts to various scenarios
Provide few-shot samples for your unique cases. Few shot samples are template Q&A that you wish the model to follow. For example, you may provide examples of how you would like the GPT to answer every question, in a style that is helpful for you.
To read more about the TEHAI Framework, consider the following references:
Reddy S, Rogers W, Makinen VP, Coiera E, Brown P, Wenzel M, Weicken E, Ansari S, Mathur P, Casey A, Kelly B. Evaluation framework to guide implementation of AI systems into healthcare settings. BMJ Health Care Inform. 2021 Oct;28(1):e100444. doi: 10.1136/bmjhci-2021-100444. PMID: 34642177; PMCID: PMC8513218.
Reddy S. Evaluating large language models for use in healthcare: A Framework for translational value assessment. Informatics in Medicine Unlocked. 2023 Volume 41. https://doi.org/10.1016/j.imu.2023.101304
Comments