From Policy to Proof: Closing the AI Evidence Gap

For years, AI governance at most mid-market organizations has followed a familiar pattern: a policy document gets written, reviewed by legal, approved by the board, and filed away. A checkbox gets ticked. The governance obligation feels discharged.

It isn't.

Two developments in 2026 make that gap harder to ignore. The NAIC's 12-state AI Systems Evaluation Tool pilot, live since March 2026, is asking insurers to produce technical documentation — not policy summaries. And the EU AI Act's August 2026 compliance baseline requires high-risk AI providers to demonstrate conformity, not declare intent.

The shift is from governance as assertion to governance as evidence. Organizations that have only the former are about to discover the difference.

What regulators are now actually asking for

The NAIC pilot is instructive. Running through September 2026 across California, Colorado, Connecticut, Florida, Iowa, Louisiana, Maryland, Pennsylvania, Rhode Island, Vermont, Virginia, and Wisconsin, the evaluation tool asks domestic insurers to complete four structured exhibits:

Exhibit A — Quantify AI usage across the business
Exhibit B — Governance and risk assessment framework
Exhibit C — Details on high-risk AI systems in operation
Exhibit D — AI data details: provenance, quality controls, monitoring

The industry has pushed back on scope and burden. But the direction of travel is unambiguous: regulators want to see what systems exist, how they are controlled, and what data they consume. A governance policy that doesn't map to specific systems, specific data flows, and specific accountability owners fails Exhibit B before it reaches Exhibit C.

The EU AI Act makes this more explicit still. The August 2, 2026 full applicability deadline requires providers of high-risk AI systems to complete conformity assessments before placing those systems on the European market. The first harmonized standard — prEN 18286, an AI Quality Management System specification — entered public enquiry in October 2025. It translates the Act's legal requirements into technical language and creates a compliance presumption for organizations that adopt it.

Policy documents describe intent. Conformity assessments verify outcomes. Regulators are moving firmly toward the latter.

The evidence gap in practice

The evidence gap shows up in three recurring patterns across mid-market organizations:

Model cards that don't exist

A model card is a structured document describing a model's intended use, known limitations, performance benchmarks, and data lineage. For any AI system making consequential decisions — credit scoring, claims triage, contract review, customer routing — this documentation is table stakes for governance. Most organizations deploying off-the-shelf AI tools have never seen the model card from their vendor. Most organizations building internal AI systems have never written one.

Drift logs that aren't kept

AI models degrade. The statistical relationships that produced good outputs in training erode as the real world changes. Monitoring for model drift — tracking whether a deployed model's outputs are shifting in ways that indicate degraded performance — is a basic operational control. Without drift logs, there is no audit trail. Without an audit trail, a governance policy that says "we monitor our AI systems" is an unverifiable assertion.

Incident records that exist only as memory

When an AI system produces a problematic output — a biased recommendation, a flagged false positive, a decision that gets reversed on review — that incident should be documented. In practice, it usually isn't. The people involved move on. The context is lost. When a regulator, a client, or an auditor asks "has this ever gone wrong, and how did you handle it?" the honest answer is often: we think it went fine.

The technical evidence stack

Closing the evidence gap requires building what we call the technical evidence stack: the set of structured artifacts that demonstrate, rather than assert, that AI governance is functional. The core components:

Model cards and system cards. For each AI system in operation, document the intended use, out-of-scope uses, training data source and vintage, performance metrics on validation data, known limitations, and accountability owner. This doesn't require deep ML expertise — it requires structured documentation discipline.

Drift and performance monitoring logs. Implement lightweight monitoring on key AI outputs. For a model making binary decisions, track the distribution of outputs over time. Flag when that distribution shifts materially. Log those flags, the review that followed, and the action taken.

Decision logs for high-stakes outputs. For AI systems making or informing consequential decisions, maintain a record of the inputs, outputs, and any human review. This is the audit trail that makes retrospective accountability possible — and is specifically what Exhibit C of the NAIC tool is designed to surface.

Vendor AI disclosure records. For AI capabilities embedded in third-party tools, obtain and retain vendor documentation: what model is used, what data it was trained on, what the vendor's incident notification process is. This is the third-party AI risk dimension that most governance programs haven't addressed systematically.

Incident records. Log AI-related incidents formally. A structured record — date, system, nature of issue, action taken, owner — is sufficient. The goal is institutional memory, not forensic documentation.

The August 2026 baseline as a planning horizon

The EU AI Act's August 2026 full applicability is a useful forcing function even for organizations that aren't primarily EU-facing. The conformity assessment requirements set a credible benchmark for what demonstrable AI governance looks like: documented systems, verified controls, traceable accountability. The prEN 18286 quality management standard, once finalized, will provide a structured framework for organizations that want to build to that standard proactively — before it becomes mandatory in their jurisdiction.

For North American organizations, the NAIC pilot is the more immediate signal. Even outside the 12 participating states, the questions in Exhibits B, C, and D represent a reasonable self-assessment for any organization with AI in operations. Can you fill them out accurately for your own business today?

If the answer is no, that is the evidence gap. Closing it isn't primarily a compliance project — it's an operational one. The documentation artifacts described above are not regulatory filings. They are internal records that a well-run AI operation would maintain regardless of external requirements.

The governance function that earns its place

AI governance programs that produce policy documents and call it done are increasingly indistinguishable from programs that do nothing. What regulators, clients, and auditors are now beginning to require is a governance function that can produce evidence: specific, auditable, verifiable documentation that governance is working — not that it was intended to work.

The NAIC pilot and the EU AI Act's August baseline are inflection points, not endpoints. The organizations that build technical evidence capability now will be better positioned to respond to any requirement that emerges from either jurisdiction. The ones that wait will find themselves reconstructing history from memory under pressure — and that is a governance failure by any definition.

The evidence gap is closeable. But it requires treating AI governance as an operational discipline, not a documentation exercise.

Sources: Fenwick — NAIC Expands AI Systems Evaluation Tool Pilot to 12 States; Repairer Driven News — NAIC using evaluation tool pilot to monitor insurance AI use; European Commission — Standardisation of the AI Act; LegalNodes — EU AI Act 2026 Updates.

From Policy to Proof: Closing the AI Evidence Gap

What regulators are now actually asking for

The evidence gap in practice

Model cards that don't exist

Drift logs that aren't kept

Incident records that exist only as memory

The technical evidence stack

The August 2026 baseline as a planning horizon

The governance function that earns its place

AI governance moves fast. Stay ahead of it.

Ready to build a governance framework that holds up?