Agent Versioning for AI Agents: how to safely release prompt, tools, and policy

Practical agent versioning in production: version manifest, contract checks, canary rollout, rollback, and run reproducibility.
On this page
  1. Idea in 30 seconds
  2. Problem
  3. Solution
  4. Agent versioning β‰  rollback
  5. Versioning-control components
  6. How it looks in architecture
  7. Example
  8. In code it looks like this
  9. How it looks during execution
  10. Scenario 1: blocked by contract mismatch
  11. Scenario 2: canary rollout
  12. Scenario 3: rollback required
  13. Common mistakes
  14. Self-check
  15. FAQ
  16. Where Agent Versioning fits in the system
  17. Related pages

Idea in 30 seconds

Agent versioning is a runtime control that fixes exactly which agent version each run uses: prompt, tools, policy, and model.

When you need it: when an agent is updated regularly and production needs safe releases without losing reproducibility.

Problem

Without versioning, new prompt, tools, and policy changes are mixed into one "current state". In demos this is barely visible. In production, it becomes unclear why one run worked and another failed.

Typical outcomes:

  • incidents cannot be reproduced precisely
  • it is hard to identify which change broke behavior
  • rollback becomes a manual and risky operation

And every minute without a clear version increases incident investigation time.

Analogy: this is like deployment without a version number. While everything works, the problem is invisible. When something breaks, it is unclear what to roll back to.

Solution

The solution is to store the agent as a versioned package (version manifest) and start runs only with pinned version_id. Each release passes contract compatibility checks and rollout gate before receiving traffic.

Version policy layer returns a technical decision: allow or stop with explicit reason:

  • contract_mismatch
  • gate_failed
  • rollback_required

This is a separate system layer, not part of prompt or model logic.

Agent versioning β‰  rollback

These are different roles in the system:

  • Versioning controls changes before and during rollout.
  • Rollback returns to a stable version when a new one already caused regression.

One without the other is not enough:

  • without versioning, targeted rollback is hard
  • without rollback, even good versioning does not save you from production incidents

Example:

  • versioning: support-agent@2.4.0 goes to 5% canary
  • rollback: return to support-agent@2.3.3 if error rate grows

Versioning-control components

These components work together on every run start.

ComponentWhat it controlsKey mechanicsWhy
Version manifestComposition of an agent versionversion_id
prompt/tools/policy hashes
Gives exact visibility into what run was executed with
Contract checksTools and policy compatibilityschema validation
tool contract check
Blocks releases with incompatible changes
Rollout gatingRelease stagingcanary stage
traffic percentage
Reduces blast radius of a new version
Runtime pinningWhich version a run executespinned version_id
immutable run metadata
Makes run reproduction and incident investigation possible
Version observabilityVisibility of rollout decisionsaudit logs
alerts on gate failures
Does not control releases directly, but quickly shows why a version was blocked

Example alert:

Slack: πŸ›‘ Agent support@2.4.0 blocked: gate_failed on canary stage (error_rate > threshold).

How it looks in architecture

Version policy layer sits between runtime and new agent-version start and blocks start before run begins. Every decision (allow or stop) is recorded in audit log.

Every run start passes through this flow before execution: runtime does not start a new version directly, it first asks the policy layer for a decision.

Flow summary:

  • Runtime prepares run start
  • Policy checks version_manifest, tool_contracts, rollout_stage, policy_version
  • allow -> pinned agent_version starts
  • stop -> release is blocked, active stable version remains
  • both decisions are written to audit log

Example

Team releases support-agent@2.4.0 with a new refund.create tool adapter. Contract check finds schema incompatibility.

Result:

  • policy returns stop (reason=contract_mismatch)
  • canary does not start
  • support-agent@2.3.3 keeps running

Versioning stops risky release before incident, not after.

In code it looks like this

The simplified scheme above shows the main flow. Critical point: runtime must start runs only with pinned version_id, not fetch "latest" during execution.

Example versioning config:

YAML
agent_release:
  stable_version: support-agent@2.3.3
  candidate_version: support-agent@2.4.0
  canary_percent: 5
  rollback_on:
    error_rate_p95: 0.05
    tool_failure_p95: 0.03
PYTHON
release_cfg = load_release_config("support-agent")
candidate = registry.get(release_cfg.candidate_version)
decision = versioning.check(candidate, runtime_context)

if decision.outcome == "stop":
    audit.log(
        run_id,
        decision=decision.outcome,
        reason=decision.reason,
        version_id=candidate.version_id,
        rollout_stage=decision.rollout_stage,
    )
    alerts.notify_if_needed(candidate.version_id, decision.reason)
    return stop(decision.reason)

selected = versioning.select(candidate, stable=release_cfg.stable_version)
run = runtime.start(
    version_id=selected.version_id,
    prompt_hash=selected.prompt_hash,
    policy_version=selected.policy_version,
)
# Decision.allow β€” conditional helper to keep a single outcome/reason model.
allow_decision = Decision.allow(reason=None)

audit.log(
    run.id,
    decision=allow_decision.outcome,
    reason=allow_decision.reason,
    version_id=selected.version_id,
    rollout_stage=selected.rollout_stage,
)

return run

How it looks during execution

Scenario 1: blocked by contract mismatch

  1. Runtime prepares start of support-agent@2.4.0.
  2. Policy checks tool contracts.
  3. Decision: stop (reason=contract_mismatch).
  4. New version is not started.
  5. Event is recorded in audit log.

Scenario 2: canary rollout

  1. Policy allows candidate at canary_percent=5.
  2. Part of runs goes to 2.4.0, the rest to stable.
  3. Metrics are tracked by version_id.
  4. If thresholds are not violated, stage is increased.
  5. Version moves to wider rollout.

Scenario 3: rollback required

  1. After rollout, tool_failure_rate grows.
  2. Policy returns stop (reason=rollback_required).
  3. Traffic returns to stable version.
  4. Runs with new version no longer start.
  5. Team analyzes incident by version_id and audit trail.

Common mistakes

  • starting runs from "latest" instead of pinned version_id
  • versioning only prompt and ignoring tools/policy contracts
  • doing full rollout without canary gate
  • not logging version_id and rollout_stage in audit
  • mixing rollback and new release in one step without stable fallback
  • not having explicit thresholds that trigger rollback

Result: releases look controlled, but under load become unpredictable.

Self-check

Quick agent-versioning check before production launch:

Progress: 0/8

⚠ Baseline governance controls are missing

Before production, you need at least access control, limits, audit logs, and an emergency stop.

FAQ

Q: Why not just update the agent's "current" version?
A: Because without pinned version_id, you lose run reproducibility and cannot localize regressions precisely.

Q: What exactly should be versioned: only prompt?
A: No. Minimum: prompt, tool contracts, policy/config, and runtime compatibility. Otherwise version is incomplete.

Q: Is canary required for every change?
A: For high-risk changes, yes. For low-risk changes, stages can be shorter, but still with explicit metrics and stop thresholds.

Q: When should rollback start?
A: When agreed thresholds are violated (error/tool failure/latency/cost). This should be a policy decision, not manual improvisation.

Q: Does versioning replace rollback?
A: No. Versioning prevents release chaos, rollback remains the emergency return mechanism to stable.

Where Agent Versioning fits in the system

Agent versioning is one of the Agent Governance layers. Together with RBAC, limits, budgets, approval, and audit, it forms one system of controlled production changes.

Next on this topic:

⏱️ 6 min read β€’ Updated March 27, 2026Difficulty: β˜…β˜…β˜…
Implement in OnceOnly
Budgets + permissions you can enforce at the boundary.
Use in OnceOnly
# onceonly guardrails (concept)
version: 1
budgets:
  max_steps: 25
  max_tool_calls: 12
  max_seconds: 60
  max_usd: 1.00
policy:
  tool_allowlist:
    - search.read
    - http.get
writes:
  require_approval: true
  idempotency: true
controls:
  kill_switch: { enabled: true }
Integrated: production controlOnceOnly
Add guardrails to tool-calling agents
Ship this pattern with governance:
  • Budgets (steps / spend caps)
  • Tool permissions (allowlist / blocklist)
  • Kill switch & incident stop
  • Idempotency & dedupe
  • Audit logs & traceability
Integrated mention: OnceOnly is a control layer for production agent systems.

Author

Nick β€” engineer building infrastructure for production AI agents.

Focus: agent patterns, failure modes, runtime control, and system reliability.

πŸ”— GitHub: https://github.com/mykolademyanov


Editorial note

This documentation is AI-assisted, with human editorial responsibility for accuracy, clarity, and production relevance.

Content is grounded in real-world failures, post-mortems, and operational incidents in deployed AI agent systems.