Artificial intelligence (AI) is rapidly moving into the clinic. From diagnosing complex diseases to writing clinical notes during patient visits, AI systems are beginning to shape how medicine is practised.
The promise is enormous.
The global medical AI market is already estimated at around $70 billion, and the technology is being positioned as a solution for everything from physician burnout to diagnostic accuracy.
But according to the inaugural State of Clinical AI Report 2026 by the ARISE Network, the industry is advancing far faster than the evidence supporting it.
95% of AI medical devices cleared by the U.S. Food and Drug Administration have never reported a single patient health outcome.
In other words, while AI is increasingly embedded in healthcare systems, much of it still lacks proof that it actually improves patient care.
The report breaks down where medical AI is truly delivering results today—and where the industry still has significant gaps to close.
AI can ace the exam. That doesn’t mean it’s ready for the ward.
Recent advances in AI models have produced impressive results in medical testing environments.
Models like OpenAI’s o1 and Google’s AMIE have shown strong diagnostic capabilities in controlled settings.
In a benchmark study published in the New England Journal of Medicine, the o1 model achieved 78% diagnostic accuracy, outperforming physicians on structured reasoning tasks.
Similarly, Google’s AIME chatbot handled patients with diseases requiring multiple visits, providing precise guideline-based patient care plans across different conditions.
But the ARISE report found these systems can be fragile when the context changes.
In one experiment, researchers added a simple “None of the above” option to a set of 100 medical questions. The accuracy of the AI’s answer dropped from 78% to 38%.

AI systems are exceptionally good at pattern recognition, but they still struggle with the kind of flexible reasoning that experienced physicians rely on every day.
In the Script Concordance Test, which evaluates clinical decision-making in uncertain scenarios, AI systems performed roughly at the level of medical students. But fell short of experienced clinicians.
The regulation problem is bigger than most people realise
One of the most concerning findings in the ARISE report is the gap between regulatory clearance and clinical evidence.
Today, the market includes more than 1,200 AI-enabled medical devices cleared by the FDA and an estimated 350,000 consumer health apps.
Yet, most have never faced rigorous peer-reviewed evaluation.
- Over 95% of FDA-cleared AI medical devices were approved through the 510(k) pathway, which allows devices to be cleared based on similarity to existing devices rather than new clinical evidence.
- Less than 1% of these cleared devices reported real patient outcome data.
- 95% of FDA device summaries had no demographic data.
- 91% had no assessment for bias across race or gender.
As AI benchmarks are saturated, the report argues that test performance alone is no longer a reliable signal of clinical readiness.
Instead, models need to be evaluated in simulated electronic health record environments and real clinical workflows, where patient complexity and uncertainty are much harder to predict.
At the same time, regulatory changes have begun easing requirements. For instance, the FDA relaxed regulatory frameworks for wearable health devices, flooding the market with AI-powered devices without proper evaluation.
Clinical reality: Reducing burnout, increasing risk
Despite the evidence gap, AI is already becoming part of everyday clinical workflows.
One of the fastest-growing categories is ambient AI scribes. Systems that listen to doctor-patient conversations and automatically generate clinical notes.
These tools have shown promise in reducing physician documentation burden, with some studies suggesting burnout reductions of over 20%.
But the impact on clinical workflows is more nuanced.
Some studies cited in the report suggest that AI scribes reduce consultation time by only about 20 seconds per visit.
There’s also evidence that clinicians may become overly reliant on AI recommendations.
In experiments where doctors were shown flawed AI suggestions, many still followed them, even when the errors were obvious.
In another study of experienced endoscopists, doctors regularly using AI to detect polyps saw their ability to detect them dropped by 6% while not using AI.
Patients’ trust in AI is sometimes dangerous.
For patients managing chronic conditions, AI is genuinely opening doors. AI-led programmes for diabetes prevention and sleep coaching have delivered outcomes comparable to human-led programmes, often with higher engagement.
AI is also making healthcare communication more accessible.
In pediatric care, AI translations of patient instructions have reached near “human-level” quality. In some cases, it produces fewer errors than professional translators.
But the report warns that patient trust in AI can sometimes become a vulnerability.
In one study, participants couldn’t distinguish between high-accuracy and low-accuracy AI medical devices. Many participants said they followed AI recommendations simply because it communicated with them with confidence.
Where AI is genuinely saving lives right now
Not everything in the report is a warning.
The ARISE Network highlights several areas where AI is already delivering clear clinical impact, particularly in medical imaging.
The most striking example comes from England, where AI-assisted stroke imaging was deployed across 26 hospitals. And the results were significant:
- Thrombectomy rates doubled
- Patient transfer times dropped significantly
For stroke patients, where every minute of delay can lead to irreversible brain damage, these improvements directly translate into lives saved and disabilities prevented.
Other successful applications highlighted in the report include:
- Expert-level accuracy in diagnosing celiac disease from imaging.
- Reliable detection of malignant tumour transformations on CT scans.
- AI matching or exceeding physician performance on specific, well-defined diagnostic tasks.
The 2026 mandate: From better models to better evidence
The central argument of ARISE Network’s State of Clinical AI 2026 report is simple:
Building powerful AI models is no longer the hardest problem in healthcare AI. Proving that they actually improve patient outcomes is.
The ARISE Network estimates that nearly 80% of clinical AI’s potential remains unrealised, largely because the industry still prioritises benchmark performance over real-world evidence.
To close that gap, the report calls for a shift toward prospective clinical trials and outcome-based validation.
As medical AI advances, so should the standards. Whether AI measurably improves patient care inside hospitals and clinics must be of utmost importance.
-By Dr Rohini Devi and the AHT Team