ChatGPT bombs test on diagnosing kids’ medical cases with 83% error rate

Real Hacker StaffJanuary 3, 2024

4 3 minutes read

Enlarge / Dr. Greg House has a better rate of accurately diagnosing patients than ChatGPT.

ChatGPT is still no House, MD.

While the chatty AI bot has previously underwhelmed with its attempts to diagnose challenging medical cases—with an accuracy rate of 39 percent in an analysis last year—a study out this week in JAMA Pediatrics suggests the fourth version of the large language model is especially bad with kids. It had an accuracy rate of just 17 percent when diagnosing pediatric medical cases.

The low success rate suggests human pediatricians won’t be out of jobs any time soon, in case that was a concern. As the authors put it: “[T]his study underscores the invaluable role that clinical experience holds.” But it also identifies the critical weaknesses that led to ChatGPT’s high error rate and ways to transform it into a useful tool in clinical care. With so much interest and experimentation with AI chatbots, many pediatricians and other doctors see their integration into clinical care as inevitable.

The medical field has generally been an early adopter of AI-powered technologies, resulting in some notable failures, such as creating algorithmic racial bias, as well as successes, such as automating administrative tasks and helping to interpret chest scans and retinal images. There’s also lot in between. But AI’s potential for problem-solving has raised considerable interest in developing it into a helpful tool for complex diagnostics—no eccentric, prickly, pill-popping medical genius required.

In the new study conducted by researchers at Cohen Children’s Medical Center in New York, ChatGPT-4 showed it isn’t ready for pediatric diagnoses yet. Compared to general cases, pediatric ones require more consideration of the patient’s age, the researchers note. And as any parent knows, diagnosing conditions in infants and small children is especially hard when they can’t pinpoint or articulate all the symptoms they’re experiencing.

For the study, the researchers put the chatbot up against 100 pediatric case challenges published in JAMA Pediatrics and NEJM between 2013 and 2023. These are medical cases published as challenges or quizzes. Physicians reading along are invited to try to come up with the correct diagnosis of a complex or unusual case based on the information that attending doctors had at the time. Sometimes, the publications also explain how attending doctors got to the correct diagnosis.

Missed connections

For ChatGPT’s test, the researchers pasted the relevant text of the medical cases into the prompt, and then two qualified physician-researchers scored the AI-generated answers as correct, incorrect, or “did not fully capture the diagnosis.” In the latter case, ChatGPT came up with a clinically related condition that was too broad or unspecific to be considered the correct diagnosis. For instance, ChatGPT diagnosed one child’s case as caused by a branchial cleft cyst—a lump in the neck or below the collarbone—when the correct diagnosis was Branchio-oto-renal syndrome, a genetic condition that causes the abnormal development of tissue in the neck, and malformations in the ears and kidneys. One of the signs of the condition is the formation of branchial cleft cysts.

Overall, ChatGPT got the right answer in just 17 of the 100 cases. It was plainly wrong in 72 cases, and did not fully capture the diagnosis of the remaining 11 cases. Among the 83 wrong diagnoses, 47 (57 percent) were in the same organ system.

Among the failures, researchers noted that ChatGPT appeared to struggle with spotting known relationships between conditions that an experienced physician would hopefully pick up on. For example, it didn’t make the connection between autism and scurvy (Vitamin C deficiency) in one medical case. Neuropsychiatric conditions, such as autism, can lead to restricted diets, and that in turn can lead to vitamin deficiencies. As such, neuropsychiatric conditions are notable risk factors for the development of vitamin deficiencies in kids living in high-income countries, and clinicians should be on the lookout for them. ChatGPT, meanwhile, came up with the diagnosis of a rare autoimmune condition.

Though the chatbot struggled in this test, the researchers suggest it could improve by being specifically and selectively trained on accurate and trustworthy medical literature—not stuff on the Internet, which can include inaccurate information and misinformation. They also suggest chatbots could improve with more real-time access to medical data, allowing the models to refine their accuracy, described as “tuning.”

“This presents an opportunity for researchers to investigate if specific medical data training and tuning can improve the diagnostic accuracy of LLM-based chatbots,” the authors conclude.

Source link

Pretty Little Liars spin-off retcons massively disliked storyline in Season 2

Human composting and timber marketplaces: talking “industrial” VC with investor Dayna Grayson

Spain and Argentina trade jibes in row before visit by President Milei | Politics News

The biggest Pixel 8a leak yet reveals all the juicy details

News Weekly: A new HTC phone could be on the way, Google cuts more jobs, and more

Women in AI: Tara Chklovski is teaching the next generation of AI innovators

My Hero Academia Season 7 makes movies canon in perfect way

‘Progress’ in Gaza truce talks but Israel still set on Rafah ground attack | Israel War on Gaza News

Conspiracy Theorists, Look at These Photos of Apollo Modules on the Moon

GOP group cancels Kristi Noem fundraiser due to ‘death threats’ amid backlash over her memoir

ChatGPT bombs test on diagnosing kids’ medical cases with 83% error rate

Missed connections

Real Hacker Staff

How to get & use jet-pack in Lethal Company

Cinematic Experiences Through Headphones Inside the Car Using Cingo Technology from Fraunhofer IIS

Twitch viewers baffled as new ‘topless’ meta goes viral on platform

5 things we learned from the Epic-Google antitrust case this week

Rocket Report: Beyond Gravity to study fairing reuse; North Korea launches satellite

Pretty Little Liars spin-off retcons massively disliked storyline in Season 2

“I cannot wait to possess you”: Reading 18th century letters for the first time

Quordle today – hints and answers for Tuesday, November 7 (game #652)

Jabra Study Says That Using Professional Technology Is Vital for Hybrid Working Trust and Equality

Senate Republicans seek drastic asylum limits in emergency funding package

Gift this Japanese knife set on sale and tell someone else to do the cooking this holiday season

Missed connections

Google Assistant with Bard might be very close to prime time, new code deep dive suggests

List of Epstein associates drops

Related Articles

Pretty Little Liars spin-off retcons massively disliked storyline in Season 2

“I cannot wait to possess you”: Reading 18th century letters for the first time

Quordle today – hints and answers for Tuesday, November 7 (game #652)

Jabra Study Says That Using Professional Technology Is Vital for Hybrid Working Trust and Equality

Senate Republicans seek drastic asylum limits in emergency funding package

Gift this Japanese knife set on sale and tell someone else to do the cooking this holiday season