At some point in the last few years, the healthcare AI conversation became focused almost entirely on performance, such as accuracy rates, benchmark scores, or how well a model performed against a physician on a licensing exam. These are not meaningless numbers, but they have quietly crowded out the question that actually determines whether clinical AI gets used: do the people whose patients depend on it actually trust it?
I've been building in this space long enough to have watched the pattern repeat. A new tool arrives with impressive validation data. It gets piloted, clinicians use it a few times, encounter something they can't verify, and quietly stop. The technology was sound but the trust wasn't there. And without trust, there is no adoption, and without adoption, there is no impact.
When I talk about trust in clinical AI, I'm referring to something more structural in nature: can the person acting on AI-generated information verify where it came from?
In healthcare, this matters in a way that is far more significant than most other domains. If a recommendation turns out to be wrong, someone could be harmed. If a data element can't be traced back to its source, it can't be validated. If it can't be validated, a clinician operating in a high-stakes environment has no rational basis for relying on it, regardless of what the aggregate accuracy statistics say.
This is the fundamental design problem that most clinical AI is still trying to work around. The outputs look good and the models perform well in controlled settings. Yet, the moment a clinician wants to know why the system produced a particular finding, or which record a specific data point came from, too many platforms either can't answer or send them on a manual records chase that defeats the purpose of using AI in the first place.
We built xCures from the beginning around a principle that every extracted clinical data element must be traceable to its source document. Not summarized or inferred from it, but traceable, so a clinician, compliance team, or quality reviewer could follow the thread from conclusion back to evidence. It was harder to build that way but ultimately the right call.
The Benchmark Problem
Part of why trust gets underweighted in clinical AI is that it's genuinely harder to measure than accuracy. You can run an extraction model against a gold standard dataset and report aprecision score. You can't easily quantify whether a clinician felt confident enough in an output to act on it, or whether they spent twenty minutes second-guessing it before giving up.
The benchmarks we use to evaluate clinical AI were largely borrowed from fields where trust is less operationally critical, such as consumer recommendation engines, search ranking, and fraud detection. In those contexts, aggregate accuracy at scale is the right thing to optimize for. In clinical care, aggregate accuracy is necessary but not sufficient since clinicians are applying recommendations to the person in front of them, right now, with incomplete information and real stakes.
What the industry needs are frameworks for evaluating AI trustworthiness alongside performance, so errors can be caught and corrected. When a clinician has a question about an output, can they answer it without calling a vendor? Is the data that feeds the model clean enough to deserve the confidence the model then projects?
What Buyers Should Actually be Asking
If you're a health system, a diagnostics company, or a digital health organization evaluating clinical AI right now, the performance slide in the vendor deck is table stakes. What matters more is what happens when the output is questioned.
Ask to see the provenance; ask whether every data element can be linked to its source document; ask what the validation methodology is, and whether it's published. Ask what the quality control process looks like after deployment, because model drift is real and most platforms don't tell you when their performance has degraded.
These are the questions that determine whether clinical AI becomes a durable part of care delivery, or an expensive pilot that clinicians quietly walk away from.
Intelligence is the easy part of this problem. The hard part is building systems that clinicians can trust when the answer really matters, and then continuously proving that the trust is warranted.
Kenny Wong is the Chief Product Officer at xCures Inc., an AI-powered healthcare data platform that transforms complex medical records into structured, decision-ready clinical intelligence.
About Digital Health & AI Innovation Summit 2026
Join World BI for Digital Health & AI Innovation Summit – a unique opportunity to connect with thought leaders from the pharma, healthcare, and medtech industries, while learning about the newest trends in digital health technologies.
This new conference provides you with industry recommendations, case studies, and the actionable insights you need to transform your own projects and strategically bring it to the next level through scientific discovery, collaborative research, medtech entrepreneurship, and pharma-tech partnerships.
Join us to learn from the best in the digital health industry, including clinicians, scientists, entrepreneurs, biomedical engineers, patient advocates, and top technology providers across the globe. And if that’s not enough, the conference will be held at the Boston Marriott Cambridge, Massachusetts.