The topic AI nailed emergency diagnoses better than doctors in Harvard trials is currently the subject of lively discussion — readers and analysts are keeping a close eye on developments.
This is taking place in a dynamic environment: companies’ decisions and competitors’ reactions can quickly change the picture.

AI has plenty of messy use cases, but emergency medicine may be one place where it can do some real good. A Harvard study comparing AI performance against doctors using patient data from emergency-room cases revealed that OpenAI’s o1 reasoning model outperformed human doctors in emergency triage diagnosis, especially in cases where decisions had to be made quickly with limited information.
A part of the Harvard trial included 76 patients who arrived at the emergency room of a Boston hospital. The AI model and two human doctors were given the same electronic health record, including basic details like vital signs, demographic information, and a short nurse-written note explaining why the patient had come in.
The AI managed to identify the exact or near-exact diagnosis 67% of cases. Meanwhile, the human doctors scored between 50% and 55%. In the second test, more detailed information was provided, which caused the AI’s accuracy to rise to 82%. On the other hand, the humans scored between 70% and 79%. It is worth noting that this gap was not statistically significant.

The premise of this study revolves around text-based medical reasoning, and not the full reality of emergency care. Researchers note that AI did not assess a patient’s distress, appearance, tone, body language, or other real-world signals doctors use in the actual ER.
Dr Adam Rodman, another lead author and a doctor at Boston’s Beth Israel Deaconess Medical Center, said AI could become part of a “triadic care model” involving the doctor, patient, and AI system.
While the results are impressive, the technologies isn’t ready to be dropped into emergency rooms just yet. Experts raised concerns over accountability, patient safety, AI errors, and whether doctors may start deferring too quickly to AI recommendations. As of right now, it can only be good enough to offer second opinion when doctors need one fast.