Idea in 30 seconds
Agent versioning is a runtime control that fixes exactly which agent version each run uses: prompt, tools, policy, and model.
When you need it: when an agent is updated regularly and production needs safe releases without losing reproducibility.
Problem
Without versioning, new prompt, tools, and policy changes are mixed into one "current state". In demos this is barely visible. In production, it becomes unclear why one run worked and another failed.
Typical outcomes:
- incidents cannot be reproduced precisely
- it is hard to identify which change broke behavior
- rollback becomes a manual and risky operation
And every minute without a clear version increases incident investigation time.
Analogy: this is like deployment without a version number. While everything works, the problem is invisible. When something breaks, it is unclear what to roll back to.
Solution
The solution is to store the agent as a versioned package (version manifest) and start runs only with pinned version_id.
Each release passes contract compatibility checks and rollout gate before receiving traffic.
Version policy layer returns a technical decision: allow or stop with explicit reason:
contract_mismatchgate_failedrollback_required
This is a separate system layer, not part of prompt or model logic.
Agent versioning β rollback
These are different roles in the system:
- Versioning controls changes before and during rollout.
- Rollback returns to a stable version when a new one already caused regression.
One without the other is not enough:
- without versioning, targeted rollback is hard
- without rollback, even good versioning does not save you from production incidents
Example:
- versioning:
support-agent@2.4.0goes to 5% canary - rollback: return to
support-agent@2.3.3if error rate grows
Versioning-control components
These components work together on every run start.
| Component | What it controls | Key mechanics | Why |
|---|---|---|---|
| Version manifest | Composition of an agent version | version_idprompt/tools/policy hashes | Gives exact visibility into what run was executed with |
| Contract checks | Tools and policy compatibility | schema validation tool contract check | Blocks releases with incompatible changes |
| Rollout gating | Release staging | canary stage traffic percentage | Reduces blast radius of a new version |
| Runtime pinning | Which version a run executes | pinned version_idimmutable run metadata | Makes run reproduction and incident investigation possible |
| Version observability | Visibility of rollout decisions | audit logs alerts on gate failures | Does not control releases directly, but quickly shows why a version was blocked |
Example alert:
Slack: π Agent support@2.4.0 blocked: gate_failed on canary stage (error_rate > threshold).
How it looks in architecture
Version policy layer sits between runtime and new agent-version start and blocks start before run begins.
Every decision (allow or stop) is recorded in audit log.
Every run start passes through this flow before execution: runtime does not start a new version directly, it first asks the policy layer for a decision.
Flow summary:
- Runtime prepares run start
- Policy checks
version_manifest,tool_contracts,rollout_stage,policy_version allow-> pinnedagent_versionstartsstop-> release is blocked, active stable version remains- both decisions are written to audit log
Example
Team releases support-agent@2.4.0 with a new refund.create tool adapter.
Contract check finds schema incompatibility.
Result:
- policy returns
stop (reason=contract_mismatch) - canary does not start
support-agent@2.3.3keeps running
Versioning stops risky release before incident, not after.
In code it looks like this
The simplified scheme above shows the main flow.
Critical point: runtime must start runs only with pinned version_id, not fetch "latest" during execution.
Example versioning config:
agent_release:
stable_version: support-agent@2.3.3
candidate_version: support-agent@2.4.0
canary_percent: 5
rollback_on:
error_rate_p95: 0.05
tool_failure_p95: 0.03
release_cfg = load_release_config("support-agent")
candidate = registry.get(release_cfg.candidate_version)
decision = versioning.check(candidate, runtime_context)
if decision.outcome == "stop":
audit.log(
run_id,
decision=decision.outcome,
reason=decision.reason,
version_id=candidate.version_id,
rollout_stage=decision.rollout_stage,
)
alerts.notify_if_needed(candidate.version_id, decision.reason)
return stop(decision.reason)
selected = versioning.select(candidate, stable=release_cfg.stable_version)
run = runtime.start(
version_id=selected.version_id,
prompt_hash=selected.prompt_hash,
policy_version=selected.policy_version,
)
# Decision.allow β conditional helper to keep a single outcome/reason model.
allow_decision = Decision.allow(reason=None)
audit.log(
run.id,
decision=allow_decision.outcome,
reason=allow_decision.reason,
version_id=selected.version_id,
rollout_stage=selected.rollout_stage,
)
return run
How it looks during execution
Scenario 1: blocked by contract mismatch
- Runtime prepares start of
support-agent@2.4.0. - Policy checks tool contracts.
- Decision:
stop (reason=contract_mismatch). - New version is not started.
- Event is recorded in audit log.
Scenario 2: canary rollout
- Policy allows candidate at
canary_percent=5. - Part of runs goes to
2.4.0, the rest to stable. - Metrics are tracked by
version_id. - If thresholds are not violated, stage is increased.
- Version moves to wider rollout.
Scenario 3: rollback required
- After rollout,
tool_failure_rategrows. - Policy returns
stop (reason=rollback_required). - Traffic returns to stable version.
- Runs with new version no longer start.
- Team analyzes incident by
version_idand audit trail.
Common mistakes
- starting runs from "latest" instead of pinned
version_id - versioning only prompt and ignoring tools/policy contracts
- doing full rollout without canary gate
- not logging
version_idandrollout_stagein audit - mixing rollback and new release in one step without stable fallback
- not having explicit thresholds that trigger rollback
Result: releases look controlled, but under load become unpredictable.
Self-check
Quick agent-versioning check before production launch:
Progress: 0/8
β Baseline governance controls are missing
Before production, you need at least access control, limits, audit logs, and an emergency stop.
FAQ
Q: Why not just update the agent's "current" version?
A: Because without pinned version_id, you lose run reproducibility and cannot localize regressions precisely.
Q: What exactly should be versioned: only prompt?
A: No. Minimum: prompt, tool contracts, policy/config, and runtime compatibility. Otherwise version is incomplete.
Q: Is canary required for every change?
A: For high-risk changes, yes. For low-risk changes, stages can be shorter, but still with explicit metrics and stop thresholds.
Q: When should rollback start?
A: When agreed thresholds are violated (error/tool failure/latency/cost). This should be a policy decision, not manual improvisation.
Q: Does versioning replace rollback?
A: No. Versioning prevents release chaos, rollback remains the emergency return mechanism to stable.
Where Agent Versioning fits in the system
Agent versioning is one of the Agent Governance layers. Together with RBAC, limits, budgets, approval, and audit, it forms one system of controlled production changes.
Related pages
Next on this topic:
- Agent Governance Overview β overall model of agent control in production.
- Rollback strategies β how to safely return to a stable version.
- Access Control (RBAC) β how to limit who can do what.
- Budget Controls β how to keep rollout spend under control.
- Audit logs for agents β how to rebuild decision chains by version_id.