Discussion about this post

User's avatar
Dr Mark Chern's avatar

Wow, this is very important. No AI can replace that careful listening.

YOUR DOCTOR KLOVER's avatar

This is a really important (and frankly overdue) reality check.

What I appreciated most is your distinction between intelligence vs judgment. In clinic, the “work” isn’t naming the diagnosis from a complete vignette, but it’s building the vignette: extracting the discriminating details from messy narratives, iteratively probing uncertainty, and then doing the hardest part of all: risk calibration (what can wait vs what cannot).

Your point that performance jumps when the full vignette is fed directly into the model is telling. It’s not that the model “can’t think”; it’s that it can’t reliably do what clinicians do all day: compensate for missingness, ambiguity, and misframing, and then escalate appropriately when uncertainty is dangerous.

Two implications feel especially high-yield:

1. If we’re going to deploy AI in patient-facing contexts, the interface has to be built for imperfect storytelling—active questioning, symptom timelines, red-flag extraction, and explicit “stop rules” that default to care escalation when the cost of being wrong is high.

2. We should stop reassuring ourselves with exam benchmarks. Passing tests is pattern recognition under controlled inputs; medicine is decision-making under incomplete information with asymmetric risk.

AI can be helpful, but until it consistently handles the uncertainty space, it’s not a substitute for clinical triage and judgment!

No posts

Ready for more?