Even when the contestants got three chances to figure out what ailed a hypothetical patient, the diagnostic software lagged far behind actual doctors. Indeed, the apps and websites suggested the right diagnosis only slightly more than half of the time, the report says.
The research team -- from Harvard Medical School, Brigham & Women's Hospital in Boston and the Human Diagnosis Project in Washington, D.C. -- asked 234 physicians to read through a selection of 45 "clinical vignettes" to see how they would handle these hypothetical patients. Each vignette included the medical history of the "patient" but no results from a physical exam, blood test or other kind of lab work.
Most of the doctors were trained in internal medicine, though the group included some pediatricians and family practice physicians too. About half of them were in residency or fellowship, so their training was not yet complete.
Even so, of the 1,105 vignettes they considered, they listed the correct diagnosis first 72% of the time, according to the study.
The 23 symptom checkers evaluated a total of 770 vignettes in an earlier study by some of the same researchers. The apps and websites (including several from professional medical organizations, such as the American Academy of Physicians, the American Academy of Pediatrics and the Dutch College of General Practitioners) listed the correct diagnosis first just 34% of the time.
Both the doctors and the computer programs were able to include more than one ailment in their differential diagnosis. So the researchers also compared how often the correct diagnosis was among the top three responses.
For the doctors, that happened 84% of the time. For the symptom checkers, it was 51% of the time.
Though the humans trounced the computers across the board, there were situations in which did a particularly good job of naming the correct diagnosis first. For instance, their margin in cases with common conditions was 70% to 38%. In cases with uncommon conditions, it grew to 76% to 28%.
The seriousness of the malady made a difference too. In cases with low acuity, doctors bested software by 65% to 41%. But in cases with high acuity, that gap widened to 79% to 24%.
"Physicians vastly outperformed computer algorithms in diagnostic accuracy," the researchers concluded. Full disclosure: Three of the study authors are doctors, and none are apps.