Chat Generative Pre-trained Transformer (ChatGPT) is a natural language processing model that generates human-like text. The tool is a large language model (LLM) trained to anticipate word sequences based on context information. ChatGPT has undergone tests and even passed the US Medical Licensing Exam.

The purpose of this new study by the researchers at the Feinstein Institutes was to test whether ChatGPT (versions 3 and 4) could pass the ACG assessment, which is intended to measure performance on the ABIM Gastroenterology Board exam.

ChatGPT-3 and ChatGPT-4 were used to answer the 2022 and 2021 American College of Gastroenterology (ACG) self-assessment tests. The exact questions are entered in both versions of ChatGPT. A 70% or higher was required to pass the rating.

There are 300 multiple choice questions with instant answers on each ACG test. ChatGPT versions 3 and 4 were used to copy and paste each question and answer. ChatGPT responded to 455 questions (145 were omitted due to an image requirement). In the two exams, Chat GPT-3 answered 296 of 455 questions correctly (65.1%) and Chat GPT-4 answered 284 questions correctly (62.4%).

Andrew C. Yacht, MD, senior vice president of academic affairs and chief academic officer at Northwell Health, said: “ChatGPT has sparked enthusiasm, but with that enthusiasm comes skepticism about the appropriateness and validity of AI’s current role in healthcare and education.”

Even if the Chat GPT is seen as a potential educational tool, it will soon receive its medical specialty certification, the study suggests.

Arvind Trindade, MD, associate professor at the Feinstein Institutes’ Institute of Heath System Science and senior author of the paper said: ‚ÄúLately there has been a lot of attention for ChatGPT and the use of AI in various industries. As far as medical education is concerned, there is a lack of research surrounding this potentially groundbreaking tool. Based on our research, ChatGPT should now not be used for medical education in gastroenterology and has a long way to go before it needs to be implemented in healthcare.

ChatGPT lacks any inherent understanding of a topic or issue. Possible explanations for ChatGPT’s unsatisfactory rating could be lack of access to paid medical journals or ChatGPT’s citation of questionable outdated or non-medical sources, requiring more research before being used reliably.

Magazine reference:

  1. Suchman, Kelly; Garg, Shashank; Trindade, Arvind J MD. ChatGPT fails the Multiple-Choice American College of Gastroenterology Self-Assessment Test. The American Journal of Gastroenterology. DOI: 10.14309/ajg.0000000000002320