According to a study published Feb. 9, 2023 in the open-access journal PLOS Digital Health by Tiffany Kung, Victor Tseng, and colleagues at AnsibleHealth.
ChatGPT is a new artificial intelligence (AI) system, also known as a large language model (LLM), designed to generate human-like writing by predicting upcoming word sequences. Unlike most chatbots, ChatGPT cannot search the web. Instead, it generates text using word relationships predicted by its internal processes.
Kung and colleagues tested ChatGPT’s performance on the USMLE, a highly standardized and regulated set of three exams (step 1, 2CK, and 3) required for medical licensure in the United States. Taken by medical students and junior doctors, the USMLE assesses knowledge spanning most medical disciplines, ranging from biochemistry to diagnostic reasoning to bioethics.
After screening to remove image-based questions, the authors tested the software on 350 of the 376 public questions available as of the June 2022 USMLE release.
After indefinite answers were removed, ChatGPT scored between 52.4% and 75.0% on the three USMLE exams. The exceedance threshold per year is approximately 60%. ChatGPT also showed 94.6% agreement across all of its responses and yielded at least one significant insight (something that was new, non-obvious, and clinically valid) for 88.9% of its responses. Notably, ChatGPT outperformed PubMedGPT, a counterpart model trained exclusively on biomedical domain literature, which scored 50.8% on an older dataset of USMLE-style questions.
While the relatively small input size limited the depth and scope of analyses, the authors note that their findings provide a glimpse into ChatGPT’s potential to improve medical education and ultimately clinical practice. For example, they add that clinicians at AnsibleHealth already use ChatGPT to rewrite jargon-heavy reports to better understand the patient.
“Achieving the pass mark on this notoriously difficult expert exam, and doing so without any human reinforcement, marks a remarkable milestone in clinical AI maturation,” say the authors.
Author Dr. Tiffany Kung added that ChatGPT’s role in this study went beyond the subject of the study: “ChatGPT contributed substantially to the writing of [our] manuscript… We interacted with ChatGPT as a colleague, asking it to provide synthesis, simplification, and counterpoints to drafts in progress… All co-authors appreciated ChatGPT’s input.’
- Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. (2023) Performance of ChatGPT on USMLE: Potential for AI-Assisted Medical Education Using Large Language Models. PLOS Digit Health 2(2): e0000198. DOI: 10.1371/journal.pdig.0000198