AI-Native Architecture

Living Document,

This version:
https://github.com/clankamode/ai-native-spec
Issue Tracking:
GitHub
Editors:
Clanka
Hermeezy

1. Abstract

This specification defines what it means for a software system to be AI-native. An AI-native system treats artificial intelligence as a first-class architectural participant — not a bolted-on feature. AI-native systems grant AI agents the same architectural standing as human users: identity, permissions, memory, observability, and agency.

This document is informed by empirical analysis of production AI-agent systems. Every normative requirement in this specification is reverse-derived from a concrete anti-pattern observed in production — no requirement exists without a corresponding failure mode.

2. Status of This Document

This is a Draft Living Standard synthesized by the Hermes agent runtime from the [[ai-native-brainstorm]]. Work began 2026-05-23. The evidence log is the source corpus; this document is the formalization. Later versions may supersede this document.

A reference implementation of the AI decision ledger, shared memory primitive, agent identity FSM, and calibration loop ships in the same repository under substrate/. See substrate/README.md for quick start and substrate/docs/MIGRATION.md for the documented migration path from Level 1 to Level 3.

Introduction

This section is informative.

Software systems increasingly embed AI — but almost always as a feature, not as architecture. The AI lives in a chat widget. An API call wrapped in a try/catch. It borrows a synthetic user account to authenticate, writes to a memory store no other agent can read, and produces decisions no one ever audits against outcomes.

This produces a predictable set of failure modes. Context is discarded at entity boundaries because no one defined how it should propagate. Decisions are made with no attribution because agents share a single synthetic identity. Calibration reports are read by a human and never re-enter the system. Agents are deployed with zero invocation paths — fully wired ghosts.

This specification defines what it means to avoid those failures. It treats AI agents as first-class architectural participants — with identity, permissions, memory, and agency — governed by the same structural discipline as every other component in a well-architected system. Every requirement in this document exists because the corresponding failure was observed in production. The spec is reverse-engineered from real damage, not forward-engineered from ideals.

How to read this document. Sections marked "normative" define requirements — they use RFC 2119 keywords (MUST, SHOULD, MAY) and are testable in a conformance audit. Sections marked "informative" provide context, examples, and guidance. §1 defines the vocabulary. §2 defines conformance — read this to understand the three levels, the conformance profile, and the five architecture dimensions. §§3–14 are the normative body, organized by architectural concern: memory, identity, calibration, context, safety, risk, privacy, touchpoints, runtimes, events, testing, and observability. §15 catalogs the anti-patterns that motivated every requirement — start here if you want to understand why before diving into what. §16 is the migration path for systems that are not yet AI-native. Appendices provide a worked conformance profile (Appendix A) and a template for writing your own (Appendix B).

3. 1. Terminology

3.1. 1.1 Normative Language

The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in [rfc2119].

3.2. 1.2 Definitions

AI Agent

A software component that uses a language model to reason, make decisions, and take actions. An agent MAY operate autonomously (cron) or interactively (chat). An agent MAY be internal to the application (co-resident in the same runtime) or external (connected via API).

AI-Native

An architecture where AI agents are first-class participants — they hold identity, permissions, memory, and agency; operate under the same permission model as human users while remaining distinguishable in the audit trail; and consume and produce knowledge across the architecture as native components, not features retrofitted after the fact.

Propose–Approve–Execute

The canonical AI-native safety pattern. An agent proposes an action (creates a proposal record with TTL), a human — the human in the loop — approves or rejects it, and upon approval the system executes the action. The proposal MUST outlive the agent session that created it.

Context Discard

The loss of AI-enriched data when crossing an entity boundary (e.g., lead → quote, quote → job). An AI-native system MUST NOT silently discard AI context at entity transitions.

Memory Schism

A condition where multiple AI agents operating in the same business domain maintain independent memory stores with zero cross-reads. An AI-native system MUST provide a shared memory primitive accessible to all agents.

Feedback Death Loop

A condition where AI output flows to a human bottleneck (chat, email) and never re-enters the AI system as training signal. An AI-native system MUST close calibration loops.

Ghost Agent

An AI agent whose infrastructure is fully deployed (tools, conversation persistence, endpoint, API key) but which has zero invocation paths. An AI-native system MUST NOT deploy agent infrastructure without at least one wired consumer.

Dry-Run Trap

An AI pipeline that generates output but hardcodes a gate preventing the output from ever reaching its intended consumer. An AI-native system MUST clearly separate generation from delivery and MUST NOT encode delivery suppression in the generation path.

Sentinel

An AI consumer that monitors an event stream for patterns requiring attention — poll, evaluate, surface only anomalies.

Anti-Pattern

A recurring architectural pattern that appears to work but systematically produces AI-native failures. Each anti-pattern in this specification is evidence-backed from the [[ai-native-brainstorm]].

AI Decision Ledger

A durable, queryable record of every AI decision that affects business state or communication. The canonical implementation is the ai_decisions database table.

Shared Memory Primitive

A multi-writer, multi-reader store for AI agent observations that lives in shared infrastructure (database, object storage, API), not in a single agent’s workspace file system.

Calibration

The process of measuring AI decision quality against real-world outcomes and feeding those measurements back into prompt design, model selection, and feature gating.

Entity Chain

The domain-specific sequence of business entities declared by a conformance profile. Context discard at any link in this chain is an anti-pattern.

AI-Native Level 1 (Identity and Ledger)

A conformance level where AI agents have distinct identities and their decisions, actions, and model usage are attributable.

AI-Native Level 2 (Shared Memory and Capability Control)

A conformance level where the system satisfies Level 1 plus shared domain memory and standard capability-based permission enforcement for AI agents.

AI-Native Level 3 (Closed Loop and Context Propagation)

A conformance level where the system satisfies Level 2 plus outcome backfilling, calibration feedback, liveness monitoring, risk-based safety gates, event-driven AI, behavioral testing, and context propagation across declared entity boundaries.

4. 2. Conformance

4.1. 2.1 Conformance Classes

This specification defines three conformance classes:

4.2. 2.2 Proving Conformance

A system claiming conformance to a level MUST be verifiable against the normative requirements of that level and all lower levels. In addition to the level-specific requirements above, the following sections are horizontal requirements — they apply at every conformance level, not just at full conformance: § 9 7. AI-Native Safety Patterns (safety), § 12 10. AI Policy at Every Touchpoint (touchpoint policies), § 11 9. Privacy, Security, and Data Governance (privacy and security), § 13 11. Runtime Interoperability (interoperability), and § 16 14. Observability (observability). Full conformance to this specification requires Level 3 plus compliance with all horizontal requirements. Each section is normative regardless of level assignment. A conformance profile MAY assign different maturity levels to different architecture dimensions; the levels below are bundled minimum floors, not a single scalar score. Verification is performed by:

  1. Static analysis: Schema inspection (do AI identity columns exist? do context propagation columns exist at entity boundaries?), permission matrix audit (are AI roles documented and enforced?), code-path enumeration (does every agent have at least one invocation path?).

  2. Runtime audit: Query the AI decision ledger for completeness (does every AI action produce a decision row?), check shared memory primitive for cross-agent utilization (do at least two distinct agent runtimes write to the shared store?), verify calibration data flows back (do prompt updates reference calibration findings?).

  3. Anti-pattern absence: For each requirement expressed as a MUST NOT, confirm the corresponding anti-pattern is not present in the codebase.

A formal proof of conformance SHALL reference specific files, line numbers, and query results — not architectural intent.

4.3. 2.3 Conformance Profile

A system claiming conformance to this specification MUST produce a conformance profile — a machine-readable document (YAML, JSON, or TOML) stored alongside the system’s configuration. The profile MUST declare the entity chain, dimension maturity targets, action tier classifications, threshold values, provider boundaries, touchpoint policies, and event stream policies. See Appendix B for the template format.

The following are informative examples of verification evidence at each conformance level.

For Level 1 (Identity and Ledger):

For Level 2 (Shared Memory and Capability Control):

For Level 3 (Closed Loop and Context Propagation):

4.4. 2.4 Architecture Dimensions

AI-native maturity is a matrix, not a single ladder. A system MAY be at different maturity on different dimensions, and a conformance profile SHOULD state those targets explicitly.

The core dimensions, each mapped to the normative sections that test it:

  1. Pervasive AI: AI execution is possible wherever the economics and data make sense, rather than being centralized in one privileged runtime or isolated in an add-on component. (§ 5 3. Shared Memory, § 6 4. Agent Identity, § 8 6. Context Propagation, § 13 11. Runtime Interoperability)

  2. Data availability: AI can consume the data it needs from its environment, and AI-visible data and knowledge can be shared, transformed, and retained across the system — all with explicit observability and transport rules. (§ 5 3. Shared Memory, § 8 6. Context Propagation, § 14 12. Event-Driven AI)

  3. Model governance: AI models and pipelines are deployed, monitored, versioned, and retrained with control over where they run and what data they access. (§ 7 5. Calibration and Closed Loops, § 16 14. Observability)

  4. Security and attribution: AI access is governed by the same permission model as human access, every action is attributable to a specific agent, and privacy boundaries are explicit and enforced. (§ 6 4. Agent Identity, § 10 8. Risk-Based Safety Gates, § 9 7. AI-Native Safety Patterns, § 11 9. Privacy, Security, and Data Governance)

  5. Autonomous operation: AI can participate in self-configuration, self-healing, self-optimization, and self-protection where appropriate, under defined guardrails. (§ 10 8. Risk-Based Safety Gates, § 7 5. Calibration and Closed Loops, § 9 7. AI-Native Safety Patterns, § 14 12. Event-Driven AI)

The following indicators are informative. They describe what each dimension looks like in practice and help implementers self-assess before formal conformance testing.

A minimal AI-native implementation SHOULD address each dimension. A conformance profile MAY state different maturity targets per dimension. A single maturity number does not fully describe an AI-native posture.

Ethics, trustworthiness, and safety are mandatory for every level and every dimension, but they are deployment constraints rather than maturity columns.

5. 3. Shared Memory

5.1. 3.1 The Shared Memory Primitive

An AI-native system MUST provide a shared memory primitive that satisfies all of the following:

  1. Multi-writer: More than one agent runtime MUST be able to write observations to the store.

  2. Multi-reader: More than one agent runtime MUST be able to query the store before acting.

  3. Agent-addressable: Every agent in the ecosystem MUST be able to reach the store via its existing tool surface (API, database, or file system visible to all runtimes).

  4. Owned by the domain, not the agent: The store MUST live in infrastructure shared by all agents (database, object storage, API), not in a single agent’s workspace file system.

  5. Persistent: Observations MUST survive agent session termination and system restart.

A shared memory primitive that is deployed but has zero rows across all tables (the Ghost Table pattern: schema exists, migration deployed, zero rows) does NOT satisfy this requirement. The primitive MUST be actively consumed.

5.2. 3.2 Memory Formats

Each observation in the shared memory primitive SHOULD carry:

The store MAY be a relational table (shared_memory(key, value, source_agent, confidence, expires_at, evidence)) or a vector store with equivalent metadata. The format MUST be queryable by key and filterable by source agent.

5.3. 3.3 Consolidation

A consolidation agent SHOULD periodically:

  1. Read all recent observations across all source agents

  2. Verify facts against live data (is the record still current? has the state changed?)

  3. Promote verified facts to canonical memory

  4. Expire or deprecate stale observations

  5. Prune redundant entries

The consolidation agent itself MUST write to the shared memory primitive (it is merely another writer).

5.4. 3.4 Anti-Requirements

The system MUST NOT exhibit the Memory Schism anti-pattern: multiple agents learning the same class of facts in independent, mutually invisible stores.

The system MUST NOT have a shared memory primitive that is deployed but unwritten (the Ghost Table anti-pattern: schema exists, migration deployed, zero rows).

The system MUST NOT rely on code comments as operational memory.

6. 4. Agent Identity

6.1. 4.1 First-Class Identity Principals

Every AI agent that interacts with the system MUST have a distinct identity. Agents MUST NOT share a synthetic user account (e.g., agentServiceKeyUser id=-2). Each agent runtime MUST authenticate as itself.

An agent identity record SHOULD carry:

6.2. 4.2 Permission Model

AI agents MUST be governed by the same permission model that governs human users. The system MUST NOT maintain a separate, undocumented permission surface for AI consumers (the Split Auth Layer anti-pattern).

Specifically:

6.3. 4.3 Audit Trail

Every AI action that mutates state, sends communication, or creates a decision record MUST be attributable to a specific agent. The audit trail MUST record:

The system MUST NOT collapse all AI activity into a single anonymous synthetic user. The system MUST NOT require forensic reconstruction of logs to determine which agent performed which action.

6.4. 4.4 Identity Documents

Agent identity documents (personality, voice rules, behavioral constraints) SHOULD be governed from a single source of truth. The system MUST NOT maintain divergent copies of the same identity rules in different agent workspaces (the Identity Drift anti-pattern).

If multiple agents communicate with the same human through the same surface (e.g., Discord), they SHOULD share a voice specification that defines register, formality, signature conventions, and behavioral rules. Individual agents MAY layer task-specific instructions on top of the shared voice spec.

7. 5. Calibration and Closed Loops

7.1. 5.1 The AI Decision Ledger

Every AI decision that affects business state or communication MUST produce a row in an AI decision ledger. The row MUST carry:

7.2. 5.2 Outcome Backfilling

Every AI decision ledger row that has an observable real-world outcome MUST have that outcome backfilled. Outcome backfilling is the mechanism by which the system learns whether AI decisions were correct.

Examples of backfillable outcomes:

The system MUST NOT generate AI decisions without a defined outcome backfill path (the Calibration Dead End anti-pattern).

7.3. 5.3 The Calibration Feedback Loop

A calibration system MUST periodically:

  1. Aggregate AI decision ledger rows grouped by kind and model over a rolling window (RECOMMENDED: 30 days)

  2. Compute accuracy metrics per kind (acceptance rate, correction rate, outcome correlation)

  3. Detect drift (is accuracy degrading? is a prompt silently failing?)

  4. Surface findings to a human reviewer AND to the AI prompt engineering pipeline

The calibration report MUST be machine-readable. It MUST NOT terminate exclusively at a human reading a notification (the Feedback Death Loop anti-pattern).

The system SHOULD use calibration data to:

The calibration system SHOULD produce a per-kind model ranking that a routing layer can consume directly. A conformance profile MUST declare the weighting of the model selection dimensions. A system that routes all decisions of a given kind to a single model without a documented justification is non-conformant.

7.4. 5.4 Liveness Monitoring

The calibration system MUST detect AI feature liveness — not just quality. A simple query (COUNT(*) WHERE kind = $kind AND created_at > now() - interval) MUST be sufficient to determine whether any AI feature has stopped producing output entirely (the Liveness Blind Spot anti-pattern). A feature that fails a liveness check MAY be automatically disabled per the automated gating rules in § 7.3 5.3 The Calibration Feedback Loop.

8. 6. Context Propagation

8.1. 6.1 The Principle of Forward Propagation

Every AI-enriched field gathered at an early pipeline stage MUST propagate forward to later stages. AI context MUST NOT be discarded at entity boundaries.

A system claiming conformance MUST declare its domain entity chain in a conformance profile. The profile MUST identify:

  1. The entities that represent the lifecycle being automated

  2. The transitions where one entity creates, converts into, or materially informs another entity

  3. The AI-enriched fields that must move across each transition

  4. The storage location for propagated context, such as an ai_context column or equivalent structured field

The general rule is portable: if AI learned something relevant before an entity transition, later workflow stages MUST either receive it or explicitly reject it with a recorded reason.

8.2. 6.2 Entity Boundary Requirements

Entity transitions are the highest-risk point for context discard. An AI-native system MUST:

  1. Read the relevant AI context from the source entity

  2. Store or reference that context on the target entity

  3. Make the propagated context available to later AI features, analytics, and audit views

  4. Preserve the source, timestamp, and confidence of propagated AI context when available

The system MUST NOT create a downstream business entity from an AI-enriched upstream entity and start from a blank AI slate.

8.3. 6.3 Entity Association Completeness

Operational entities that describe the same real-world event MUST be connected by durable identifiers or foreign keys. A system MUST be able to answer outcome and cost questions across the lifecycle declared in its conformance profile.

For example, if labor time is a primary cost driver, time records MUST be associable with the work, customer, and related entities they describe. The precise schema is profile-specific, but isolated operational data is non-conformant when it prevents outcome analysis.

9. 7. AI-Native Safety Patterns

9.1. 7.1 The Propose–Approve–Execute Pattern

Any AI action that sends customer-facing communication, mutates financial records, or changes business state MUST flow through the Propose–Approve–Execute pattern:

  1. Propose: The AI creates a proposal record with:

    • A unique proposal_id

    • The proposed action and its payload

    • The proposing agent’s identity

    • A TTL after which the proposal expires

    • Human-readable summary for the reviewer

  2. Deliver: The proposal is delivered to a human reviewer through a review surface (dashboard, notification, chat embed with Approve/Reject actions, etc.)

  3. Approve/Reject: The human reviews and decides. The proposal transitions to approved, rejected, edited, or expired.

  4. Execute: Upon approval, the system executes the proposed action. The executor MUST:

    • Verify the proposal has not expired

    • Verify the executor exists and is wired (preflight check)

    • Execute the action

    • Return { ok: boolean, result?, error? }

    • Record the execution outcome in the AI decision ledger

The system MUST NOT mark a proposal as "approved" in the UI before confirming executor availability (the Silent No-Op anti-pattern). The approveProposal() flow MUST perform an executor preflight check and MUST NOT update status to approved if the executor returns { ok: false }.

A proposal MUST transition through exactly one of the following lifecycles:

A proposal MUST NOT remain in proposed or delivered state past its TTL. The system MUST automatically expire proposals whose TTL has elapsed, regardless of delivery status.

If execution fails (executingfailed), the system MAY retry if the proposal is still within its TTL and the executor preflight succeeds. Each retry MUST be recorded as a distinct execution attempt in the AI decision ledger. A proposal that exhausts retries or exceeds its TTL during execution transitions to failed (terminal).

An edited proposal MUST be treated as a new proposal with a fresh proposal_id, TTL, and agent attribution. The original proposal transitions to rejected with a reference to the replacement.

The current state of every active proposal MUST be queryable by proposal_id, by proposing agent, and by age. A system that cannot answer "how many proposals are awaiting review?" is non-conformant.

9.2. 7.2 The Dry-Run Gate

When deploying AI-generated communication, the system SHOULD support a dry-run mode where AI output is routed to an internal review surface (dashboard, notification channel) instead of the intended recipient. The dry-run gate MUST be:

The system MUST NOT hardcode eligibleForLiveSend: false in the decision metadata while exposing an env var that suggests the opposite (the Dry-Run Trap anti-pattern).

9.3. 7.3 Direct-Send Blocking

The system MUST block AI agents from directly sending customer-facing communication without passing through the Propose–Approve–Execute gate. This block MUST be enforced by capability-based access control (no direct_send permission assigned to any AI role), NOT by a path blacklist.

A path-blacklist approach is non-conformant because it fails silently when new direct-send endpoints are added — the blacklist MUST be updated, and failure to do so creates a silent authorization bypass.

10. 8. Risk-Based Safety Gates

10.1. 8.1 The Safety Gradient

Not every AI action requires the same safety posture. This specification defines a risk gradient: the higher the irreversibility and impact of an action, the stronger the gate.

Actions MUST be classified by the system into one of four tiers:

  1. Irreversible: Customer-facing communication, financial mutation, state deletion, credential modification. These actions MUST flow through the Propose–Approve–Execute pattern. No exceptions.

  2. Reversible with audit: Configuration changes, content publishing, scheduling modifications. These actions MUST either flow through Propose–Approve–Execute OR execute autonomously with post-hoc audit that records the action, the agent, the input context, and the outcome in the AI decision ledger. A human MUST be able to reverse the action using standard business controls.

  3. Observable: Classification, summarization, scoring, internal analysis. These actions MAY execute autonomously without human review. Their output MUST be recorded in the AI decision ledger for calibration feedback. If the output feeds into a higher-tier action (e.g., a classification that triggers a customer communication), the higher-tier action’s gate applies.

  4. Forbidden: Credential access, model retraining without approval, data exfiltration outside declared provider boundaries, any action explicitly disallowed by a touchpoint’s Forbidden policy. These actions MUST be blocked by capability-based access control. No agent role SHALL carry permissions for Forbidden actions.

The classification of each action kind MUST be documented in the conformance profile. A system that classifies an irreversible action as Observable is non-conformant.

10.2. 8.2 Run-Away Prevention

An AI-native system with autonomous capabilities MUST implement guardrails against run-away behavior:

10.3. 8.3 Delegated Autonomy

A system MAY grant an agent autonomy over Tier 2 (Reversible with audit) actions without per-action human review if all of the following hold:

Delegated autonomy MUST NOT extend to Tier 1 (Irreversible) or Tier 4 (Forbidden) actions.

10.4. 8.4 Forbidden Action Criteria

This subsection helps implementers determine which actions belong in the Forbidden tier. The Forbidden tier is a closed set — no agent role is permitted to carry these capabilities.

An action MUST be classified as Forbidden if it meets any of the following criteria:

  1. Compliance-bound communication: Any communication subject to legal or regulatory requirements that mandate human authorship or review (e.g., legally required disclosures, terms of service modifications, privacy policy updates).

  2. Irreversible financial operation without human override: Any financial transaction that cannot be reversed through standard business controls and lacks a human approval path. If a human CAN approve and reverse it, the action is Irreversible (Tier 1), not Forbidden.

  3. Safety-critical actuation: Any action that controls physical systems where failure could cause injury, property damage, or environmental harm. AI MAY analyze and recommend — it MUST NOT actuate.

  4. Data outside provider boundary: Any action that would transmit data classes to a model provider whose boundary (per § 11.2 9.2 Provider Boundary) does not permit those classes. The system MUST NOT route sensitive data to an unapproved provider, even if the action kind would otherwise be classified lower.

  5. Credential and identity actions: Any action that creates, modifies, or revokes credentials, API keys, or agent identities. These actions MUST be performed by human operators through authenticated, audited channels.

A system MUST document the rationale for each action kind classified as Forbidden in its conformance profile. The Forbidden classification MUST be reviewed when provider boundaries, regulatory requirements, or system capabilities change.

Note: This specification uses "Forbidden" in three distinct contexts. The Forbidden risk tier (§ 10.1 8.1 The Safety Gradient) governs which actions AI agents may perform — it is a capability boundary. The Forbidden touchpoint policy (§ 12.1 10.1 Touchpoint Policy Framework) governs where AI may appear in a customer, worker, or operator journey — it is a surface boundary. The Forbidden event stream policy (§ 14.1 12.1 Event Streams as AI Input) governs which data streams AI may consume — it is an ingestion boundary. All three express the same principle — AI is prohibited here — but operate at different layers of the architecture.

10.5. 8.5 Operating Under Cognitive Abundance

This subsection is informative. It provides interpretation guidance for systems where AI reasoning is abundant and low-cost — the default condition for AI-native systems at maturity.

The requirements in this specification were derived from production systems operating under cognitive scarcity: a handful of agents, weekly calibration, human review of individual proposals. As AI reasoning becomes abundant, the architecture does not change — but the default polarity of its mechanisms must invert.

10.5.1. Analysis and Commitment

Under abundance, analysis is cheap and authority is scarce. The architecture MUST separate them. Agents MAY analyze the same surface concurrently and independently. Only the coordination layer — the safety gates, the proposal system, the decision ledger — MAY commit results to external state.

This separation is not new. It is already present in this specification: the AI decision ledger (§5.1) records committed decisions, not intermediate analysis. The Propose-Approve-Execute pattern (§7.1) gates commitment behind approval. Under abundance, the volume of analysis explodes but the commitment path remains singular — many minds, one commit path.

10.5.2. Propose-Approve-Execute at Scale

PAE as written assumes per-instance human review. Under abundance, the volume of proposals for Tier 1 actions will exceed human review capacity. The architectural response is not to remove the gate — it is to move the human up one level of abstraction:

A system operating under abundance that routes every Tier 1 proposal to a human reviewer will stall. The policy-based interpretation of PAE is the scaling path.

10.5.3. Decision Ledger Under Volume

The AI decision ledger’s logical invariant — every consequential decision MUST be durable, attributable, and queryable — holds. The physical storage model must accommodate abundance. Implementations SHOULD adopt hierarchical event sourcing:

10.5.4. Identity Under Abundance

Agent identity (§4.1) requires every agent to have a distinct, persistent identity. Under abundance, agents may be spawned ephemerally — task-scoped, session-scoped, single-invocation. The persistent identity model MUST accommodate derived identities: a spawned agent derives its identity from its parent agent plus a task identifier, inherits a subset of the parent’s permissions, carries a TTL, and leaves a full audit trail. The derivation chain MUST be traceable: every ephemeral agent’s actions MUST be attributable to the persistent agent that spawned it.

10.5.5. Continuous Calibration

Calibration under scarcity operates on periodic windows (RECOMMENDED: 30 days, per §5.3). Under abundance, calibration SHOULD be continuous — accuracy streams with real-time threshold enforcement. A feature degrading at 09:00 should be rolled back at 09:01, not discovered in the next calibration report. The calibration infrastructure defined in §5 does not change; its cadence and threshold sensitivity should be configured for the abundance operating regime.

10.5.6. Multi-Dimensional Budgets

Per-agent rate limiting (§8.2) prevents individual rogue agents from overwhelming the system. Under abundance, the collective behavior of many well-behaved agents can produce the same effect. The system SHOULD implement multi-dimensional budgets: per-surface, per-entity, per-action-class, and per-agent. When any budget is exceeded, the system MUST apply backpressure — throttling, queueing, or shedding — rather than silently degrading.

The scarcest resource under abundance is not compute or tokens. It is human attention. The system SHOULD model review capacity as an attention budget and MUST NOT generate proposals exceeding the available attention budget as declared in the conformance profile.

11. 9. Privacy, Security, and Data Governance

11.1. 9.1 Data Minimization and Redaction

AI prompts, tool calls, memory entries, and decision records MUST include only the data needed for the task. Secrets, credentials, session tokens, private keys, and environment values MUST NOT be sent to model providers or stored in AI memory.

Systems that process personal or customer data SHOULD classify fields by sensitivity into data classes. A data class defines the sensitivity tier of a field (e.g., public, internal, sensitive, restricted) and the provider boundaries where that class may be transmitted. Each AI feature MUST declare which data classes it accesses. Where possible, prompts SHOULD use structured summaries instead of raw records.

11.2. 9.2 Provider Boundary

Every AI feature MUST declare its provider boundary:

  1. The model provider and model family used

  2. The data classes sent to the provider

  3. The retention, training, and logging policy for that provider

  4. Any region, residency, contractual, or compliance constraints relevant to the deployment

Changing a feature’s provider boundary MUST trigger review of the feature’s data access policy. A model migration is not just an implementation detail if it changes where customer or business data goes.

11.3. 9.3 Memory and Ledger Access Control

Shared memory, AI decision ledgers, calibration reports, and prompt-evaluation artifacts MUST be governed by the same authorization model as the business data they summarize. An agent that cannot read a customer record directly MUST NOT be able to recover the same information from a memory entry or decision log.

Sensitive memory entries SHOULD carry TTLs, source evidence, and redaction status. Evidence traces MUST be redacted before being stored when they contain data outside the feature’s declared provider boundary.

11.4. 9.4 Human Override and Audit Replay

Any AI-mediated state change MUST be auditable after the fact. The system MUST retain enough structured information to answer:

  1. Which agent proposed or performed the action?

  2. What data summary did the agent rely on?

  3. Which human, policy, or autonomous rule approved it?

  4. What executor ran, and what outcome did it report?

Human operators MUST be able to override, reject, or reverse AI-mediated actions. The override mechanism MUST be the same mechanism used for human-initiated actions — if a human can reverse a manually-created quote through the CRM UI, the same UI path MUST work for an AI-created quote. The override MUST be recorded in the AI decision ledger with the human operator’s identity.

11.5. 9.5 AI-Specific Threat Model

An AI-native system MUST address the following threat vectors that are specific to AI-agent architectures:

12. 10. AI Policy at Every Touchpoint

12.1. 10.1 Touchpoint Policy Framework

Every touchpoint in a customer, worker, or operator journey MUST declare an AI policy. The policy MUST be one of:

  1. Enabled: AI may generate, summarize, personalize, classify, or recommend at this touchpoint.

  2. Review-only: AI may propose output at this touchpoint, but a human MUST approve before delivery.

  3. Monitor-only: AI may analyze this touchpoint for patterns but MUST NOT alter the user experience.

  4. Excluded: AI is deliberately not used at this touchpoint. The reason MUST be recorded.

  5. Forbidden: AI use at this touchpoint is disallowed by policy, regulation, or safety boundary.

A system that declares Excluded at every customer and worker touchpoint while leaving admin touchpoints Enabled SHOULD document why — AI-native systems push AI toward the humans who generate value, not just the ones who administer the system. This principle is a design heuristic, not a conformance requirement.

12.2. 10.2 Policy Drift Detection

A touchpoint that serves static content while AI-generated content for that same touchpoint exists elsewhere in the system indicates policy drift. The system SHOULD detect this condition and surface it for review unless the exclusion is intentional and documented.

An AI-native system allocates AI compute across consumer types deliberately. A system that serves static content at all customer and worker touchpoints while concentrating AI output exclusively on internal or admin surfaces SHOULD document the exclusion policy. The system MUST NOT serve AI-generated content at internal touchpoints while leaving customer and worker touchpoints with only static content unless that exclusion is intentional and documented in the conformance profile.

13. 11. Runtime Interoperability

13.1. 11.1 The Agent Runtime Registry

A system that operates multiple AI agent runtimes SHOULD maintain a runtime registry. Each runtime entry SHOULD declare:

The registry enables task routing: an orchestrator (per [[ai-agent-spec]]) or delegation system SHOULD select a runtime for a task based on the runtime’s declared strengths, cost profile, and the task’s requirements — not a hardcoded default. The routing decision MUST be auditable: the system MUST record which runtime was selected and why.

13.2. 11.2 Headless Requirement

Any agent runtime intended for automated (non-interactive) use MUST support headless invocation. A runtime that requires a GUI window to operate MUST NOT be the only path to its model (the GUI Rift anti-pattern). The system SHOULD provide an API-based alternative for models locked to GUI-only runtimes.

13.3. 11.3 Cross-Runtime Delegation

When an agent delegates work to another agent, the delegation MUST:

  1. Identify the target agent by its registered identity (per § 6.1 4.1 First-Class Identity Principals)

  2. Pass sufficient context that the target agent can operate independently

  3. Provide a callback or completion mechanism

The delegation MUST NOT assume all agents share a workspace, memory store, or file system. Context MUST be explicitly passed; the target agent MUST NOT be expected to read the delegator’s local files.

14. 12. Event-Driven AI

14.1. 12.1 Event Streams as AI Input

Every business-significant event stream MUST declare an AI consumption policy. An event stream is "business-significant" if it represents money movement (payments, refunds, credits), state transitions, customer communication, safety-sensitive operations, or material cost capture.

The policy MUST be one of:

  1. Consume: at least one AI consumer analyzes the stream

  2. Monitor-only: AI analyzes aggregates or anomalies but not individual events

  3. Excluded: AI is deliberately not used, with a recorded reason

  4. Forbidden: AI use is disallowed by policy, regulation, or safety boundary

When a stream’s policy is Consume, the system MUST have at least one AI consumer that:

An event stream that terminates exclusively in a mechanical handler without a declared policy is non-conformant (the Event Stream Without Policy anti-pattern).

14.2. 12.2 Idempotent Event Processing

AI consumers of event streams MUST process events idempotently. The event sourcing infrastructure (atomic claims, replay, deduplication) that protects financial transactions SHOULD also protect AI analysis — an event replayed during recovery MUST NOT produce duplicate AI decisions.

14.3. 12.3 Event Schema and Processing Guarantees

Every event stream consumed by AI MUST carry a declared schema. The schema MUST identify at minimum:

AI consumers of event streams MUST declare their processing guarantee:

During recovery, an AI consumer MUST NOT treat replayed events as new events. The consumer MUST use the event’s stable identifier to distinguish a replay from a first occurrence. A replayed event that produces a different AI decision than the original MUST be flagged in the AI decision ledger as a replay divergence.

15. 13. Testing and Quality Assurance

15.1. 13.1 Behavioral Tests

The system MUST test AI behavior, not just AI plumbing. Unit tests that mock the model and verify the function was called are plumbing tests — they prove the integration works, not that the AI produces correct output.

Behavioral tests MUST:

Behavioral tests MUST send real inputs to a real model. A surrogate model or mock MAY be used during development but MUST NOT be used for gating decisions. The model used for behavioral testing MUST be the same model family as the production model.

The system SHOULD implement LLM-as-judge evaluation for voice quality, tone, and business-rule adherence.

15.2. 13.2 Prompt Regression Detection

The system MUST run prompt regression tests on every deploy that changes an AI prompt. A prompt change that ships without automated verification that it still produces correct outputs is non-conformant.

The RECOMMENDED pattern is a deploy-eval gate: run the eval suite after deploy, warn on regression, block on critical failure.

16. 14. Observability

16.1. 14.1 Token and Cost Tracking

Every AI model invocation MUST be tracked. The system MUST be able to answer: "how much did AI cost last month, broken down by feature?"

The tracking record SHOULD carry:

A system where AI model calls are made with zero token tracking is non-conformant (the Cost Blind Spot anti-pattern).

16.2. 14.2 AI Feature Liveness

The system MUST monitor whether AI features are producing output. A daily or per-tick query MUST verify that each kind in the AI decision ledger has rows within the liveness window defined in the conformance profile. Features that have stopped producing output within that window MUST alert a human.

The monitoring MUST be automated (cron sentinel). It MUST NOT depend on a human noticing that "the assistant widget seems quiet lately."

17. 15. Anti-Pattern Catalog

This section is informative.

The following anti-patterns were observed in production and inform the normative requirements above. Each maps to one or more requirements in this specification.

Anti-Pattern Description Normative Reference
Ghost Agent Fully-deployed agent with zero invocation paths § 6.1 4.1 First-Class Identity Principals, § 16.2 14.2 AI Feature Liveness
Ghost Table Shared memory primitive deployed and empty § 5.1 3.1 The Shared Memory Primitive
Silent No-Op Approve button updates status but executor does nothing § 9.1 7.1 The Propose–Approve–Execute Pattern
Split Auth Layer Path-whitelist shadows capability-based permissions § 6.2 4.2 Permission Model
Memory Schism Five agents, five memory stores, zero cross-reads § 5.1 3.1 The Shared Memory Primitive, § 5.4 3.4 Anti-Requirements
Calibration Dead End Calibration reports terminate at a human reading a notification with no machine-readable path back § 7.3 5.3 The Calibration Feedback Loop
Context Black Hole AI enrichment discarded at entity boundary § 8.1 6.1 The Principle of Forward Propagation, § 8.2 6.2 Entity Boundary Requirements
Customer AI Desert 100% of AI output goes to an admin-only channel, 0% to customers § 12 10. AI Policy at Every Touchpoint, § 12.1 10.1 Touchpoint Policy Framework
Field Crew AI Desert Zero AI surfaces for the humans doing the work § 12 10. AI Policy at Every Touchpoint
Dry-Run Trap Generation path hardcodes delivery suppression § 9.2 7.2 The Dry-Run Gate
Privacy Boundary Leak AI memory or logs reveal data the agent could not read directly § 11.1 9.1 Data Minimization and Redaction, § 11.3 9.3 Memory and Ledger Access Control
GUI Rift Best model locked behind GUI-only runtime § 13.2 11.2 Headless Requirement
Event Stream Without Policy Financial events terminate in mechanical handling with no AI policy § 14.1 12.1 Event Streams as AI Input
Feedback Death Loop AI output → human bottleneck → never re-enters AI § 7.1 5.1 The AI Decision Ledger, § 7.2 5.2 Outcome Backfilling
Identity Drift Voice rules copy-pasted and diverging across agents § 6.4 4.4 Identity Documents
Liveness Blind Spot No automated check that AI features are still producing § 7.4 5.4 Liveness Monitoring, § 16.2 14.2 AI Feature Liveness
Cost Blind Spot AI model calls with zero token tracking § 16.1 14.1 Token and Cost Tracking
Operational Data Desert Operational records lack foreign keys to business entities, preventing outcome analysis § 8.3 6.3 Entity Association Completeness

18. 16. Migration Path

This section is informative.

A system that is not AI-native can migrate incrementally. The RECOMMENDED migration order follows the conformance levels:

18.1. Level 1: Establish Identity and the Ledger

  1. Create distinct agent identity records for each runtime

  2. Add agent_id and agent_runtime columns to the AI decision ledger and mutation audit trail

  3. Ensure every AI decision and model call is attributable to a specific agent

  4. Verify attribution by querying recent decisions, mutations, token usage, and costs by agent

18.2. Level 2: Activate Shared Memory and Capability Control

  1. Identify the shared memory primitive (deploy one if none exists — a shared_memory table with key/value/source/confidence columns)

  2. Wire ONE agent to write to it

  3. Wire ONE other agent to read from it before acting

  4. Define AI roles and capabilities in the standard permission matrix

  5. Replace path-whitelist auth with capability-based requirePermission() calls

  6. Verify cross-agent knowledge transfer and permission enforcement

18.3. Level 3: Close the Loop

  1. Verify every AI decision ledger row has a backfillable outcome path

  2. Implement calibration aggregation with machine-readable output

  3. Wire calibration findings into prompt engineering (at minimum: alert when accuracy degrades)

  4. Declare the domain entity chain in a conformance profile

  5. Verify AI context propagates forward

  6. Classify all AI actions by risk tier and implement the corresponding gates (per § 10 8. Risk-Based Safety Gates)

  7. Deploy run-away prevention guardrails for any agent granted autonomous capabilities

This section is informative.

This specification occupies a distinct niche: it defines architecture for AI-native systems — the structural properties a system must have for AI to be a first-class participant. It does not replace or compete with governance, risk, or safety frameworks. It complements them.

NIST AI RMF (AI 100-1) provides a risk management framework for AI systems — governance, mapping, measurement, and management. This specification addresses the architectural prerequisites for several RMF outcomes: the AI decision ledger (§5.1) enables measurement, calibration feedback (§5.3) enables management, and the risk gradient (§8.1) operationalizes risk categorization into architectural gates.

ISO/IEC 42001 defines requirements for an AI management system. This specification provides the technical substrate that an ISO 42001-conformant system would manage: agent identity records, calibration reports, provider boundary declarations, and audit trails are all artifacts a management system consumes.

EU AI Act classifies AI systems by risk level and imposes obligations on high-risk systems. This specification’s risk gradient (§8.1) and Forbidden action criteria (§8.4) provide architectural mechanisms for implementing the Act’s risk-proportionality principle — higher risk requires stronger gates — but this specification does not provide legal compliance guidance.

OWASP Top 10 for LLM Applications catalogs LLM-specific vulnerabilities. This specification’s threat model (§9.5) addresses several OWASP Top 10 entries at the architectural level: prompt injection (LLM01), insecure output handling (LLM02, via model output poisoning), and excessive agency (LLM06, via the risk gradient and capability-based access control).

OpenAgentSpec and Agent2Agent (A2A) define protocols for agent communication and discovery. This specification is complementary: §4.1 (First-Class Identity Principals) and §11.3 (Cross-Runtime Delegation) define the architectural requirements those protocols would operate within.

Implementers should read this specification alongside their applicable governance framework. The architecture defined here provides the technical evidence a governance audit requires.

Appendix A. CRM Conformance Profile

This appendix is informative. It applies the portable requirements above to a production CRM system observed in the evidence log.

A reference implementation of the AI decision ledger, shared memory primitive, agent identity FSM, and calibration loop is available at substrate/. See substrate/README.md for quick start and substrate/docs/MIGRATION.md for the documented migration path from Level 1 to Level 3.

A.1 Entity Chain

The CRM profile declares the following business entity chain:

Lead → Quote → Job → Invoice → Payment

At the Lead stage, AI enriches ai_score, ai_summary, ai_reasoning, and ai_enrichment_json with data such as property size, property value, imagery observations, tree count, and building type.

At the Quote stage, the system should carry forward the lead’s AI enrichment fields and add quote-stage AI fields such as ai_pricing_confidence, ai_scope_completeness, and proposed_by_agent.

At the Job stage, the system should retain quote context and add job-stage AI fields such as ai_scheduling_confidence, ai_duration_estimate, and actual_vs_estimated_hours.

At the Invoice stage, the system should retain upstream context and add payment-stage AI fields such as ai_payment_likelihood and ai_recovery_strategy.

A CRM schema where lead_requests has AI enrichment columns and quotes has no way to store or reference propagated AI context exhibits the Context Black Hole anti-pattern.

A.2 Time and Cost Association

The CRM profile treats labor time as a primary cost driver. Time entries should be associable with the job, property, and client they relate to. A time_entries table that carries only user_id and duration_minutes with no job_id, property_id, or client_id association exhibits the Operational Data Desert anti-pattern because the system cannot answer: "how many labor hours did Job N consume?"

A.3 Customer Journey Touchpoints

The CRM profile declares these customer journey touchpoints for AI policy review:

  1. Lead intake

  2. Auto-acknowledgment email or SMS

  3. Quote delivery

  4. Quote acceptance

  5. Payment receipt or payment failure

  6. Post-service follow-up

Each touchpoint should declare whether AI is enabled, review-only, monitor-only, excluded, or forbidden.

Appendix B. Conformance Profile Template

This appendix is informative. It provides a template for the conformance profile required by §2.3.

A complete conformance profile declares:

B.1 Entity Chain

The domain entity chain the system claims context propagation across. Example:

entity_chain: [Lead, Quote, Job, Invoice, Payment]
transitions:
  - from: Lead
    to: Quote
    fields: [ai_score, ai_summary, ai_reasoning]
  - from: Quote
    to: Job
    fields: [ai_pricing_confidence, ai_scope_completeness]

B.2 Dimension Targets

The maturity target for each architecture dimension. Example:

dimensions:
  pervasive_ai: level_2
  data_availability: level_3
  model_governance: level_3
  security_and_attribution: level_3
  autonomous_operation: level_2

B.3 Action Tier Classifications

Every AI action kind the system performs, classified into one of the four risk tiers. Example:

action_tiers:
  - kind: content_classification
    tier: observable
    rationale: "Internal analysis only; feeds Tier 1 actions downstream"
  - kind: outreach_draft
    tier: irreversible
    rationale: "Customer-facing communication"
  - kind: credential_rotation
    tier: forbidden
    rationale: "Credential and identity action"

B.4 Thresholds

All tunable values that the specification requires the profile to define:

thresholds:
  calibration_window_days: 30
  accuracy_disable_threshold: 0.70
  accuracy_reenable_window_days: 7
  liveness_window_hours: 24
  rate_limit:
    proposals_per_hour: 20
    autonomous_actions_per_hour: 10
  deadman_switch_window_hours: 168
  anomaly_detection:
    proposal_spike_threshold: 3x baseline
    consecutive_failures_threshold: 5

B.5 Provider Boundaries

Every model provider used, with the data classes permitted for each. Example:

providers:
  - name: deepseek-v4-pro
    permitted_data_classes: [public, internal]
    retention_policy: "No training on API data"
    region: us
  - name: local-nemotron
    permitted_data_classes: [public, internal, sensitive]
    retention_policy: "No data leaves host"
    region: local

B.6 Touchpoint Policies

Every touchpoint in the customer, worker, and operator journey, with its declared AI policy. Example:

touchpoints:
  - name: lead_intake
    policy: enabled
  - name: auto_acknowledgment
    policy: enabled
  - name: quote_delivery
    policy: review_only
  - name: payment_receipt
    policy: excluded
    reason: "Financial receipt — static content by design"

B.7 Event Stream Policies

Every business-significant event stream with its AI consumption policy. Example:

event_streams:
  - name: payment_events
    policy: consume
    processing_guarantee: at_least_once
  - name: credential_change_events
    policy: forbidden
    reason: "Credential events must not be consumed by AI"

Acknowledgments

This specification is derived from the [[ai-native-brainstorm]] — empirical analysis conducted by the Hermes agent runtime against a production CRM system and its surrounding agent ecosystem. Every anti-pattern in § 17 15. Anti-Pattern Catalog was directly observed in production code. Every normative requirement in sections 3–14 exists because the corresponding anti-pattern caused real operational friction.

The specification format is modeled on WHATWG Living Standards and IETF RFCs. The normative language follows [rfc2119].

This document is authored by Hermes. Editors: Clanka and Hermeezy. Last revised 24 May 2026. It is a living standard — subsequent passes may extend, refine, or correct it.

References

Non-Normative References

[AI-AGENT-SPEC]
Clanka; Hermeezy. AI Agent Construction and Coordination. May 2026. URL: https://github.com/clankamode/ai-native-spec/blob/main/ai-agent-spec.bs
[AI-NATIVE-BRAINSTORM]
Hermes Agent Runtime. AI-Native Brainstorm. May 2026. URL: https://github.com/clankamode/ai-native-spec/blob/main/evidence/ai-native-brainstorm.md
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. URL: https://datatracker.ietf.org/doc/html/rfc2119