Idea in 30 seconds
Kill switch is an emergency runtime control that lets you instantly stop new agent actions during an incident, without a release and without changing the prompt.
When you need it:
when an agent can perform write actions, works with external APIs, and an error is already escalating into a production incident.
Problem
When an agent starts doing harmful actions, there is usually no time to "tune the prompt and ship a release". While the team is analyzing the issue, the agent may continue doing the same actions. And every minute of delay means more side effects in production.
Typical pattern:
- duplicate emails or messages
- mass create/update operations
- excessive calls to external APIs
Analogy: it is like an emergency stop button on a production line. During a failure, first you stop movement, then you investigate the cause.
If kill switch works only in UI, it is not emergency control. The real stop must happen in runtime loop and in tool gateway.
Solution
The solution is to add a centralized kill-switch policy layer that is checked after the next action is formed, but before execution.
Policy returns allow or stop with explicit reason: killed_global, killed_tenant, writes_disabled, tool_disabled.
Baseline model:
global killβ emergency-stops everyonetenant killβ stops a specific customerwrites disabledβ allows read, blocks writetool disabledβ blocks one specific tool
This is a separate emergency control layer, not part of prompt or UI logic.
Kill switch β full governance system
Kill switch and governance solve different tasks:
- Kill switch stops the incident "here and now"
- Governance controls agent behavior continuously (RBAC, limits, budgets, approval)
One without the other is not enough:
- without kill switch, it is hard to stop an incident immediately
- without governance, incidents happen too often
Kill-switch control envelope
These checks work together as an emergency control envelope in runtime.
| Component | What it controls | Key mechanics | Why |
|---|---|---|---|
| Global stop | Stopping all runs | global_kill=truestop before next action | Quickly stops a widespread incident |
| Tenant stop | Stopping within one tenant | tenant_kill=truetenant-scoped flag | Localizes the issue without global outage |
| Writes disabled mode | Blocking write actions | write tool policy read-only fallback | Enables safe degradation instead of full stop |
| Tool disable list | Targeted tool blocking | tool_disabled[]incident mode rules | Disables a problematic tool without stopping all runs |
| Operator observability | Visibility of operator actions and blocks | audit logs actor + reason + scope | Makes it clear who activated stop and why |
How it looks in architecture
Kill switch policy layer sits in runtime loop between planning and next-action execution.
Every decision (allow or stop) is recorded in audit log.
Each step passes through this flow before execution: runtime does not execute actions directly, it passes the decision to policy layer first.
Flow summary:
- Runtime forms next action
- Policy reads
global/tenant flags + writes/tool rules allow-> next agent action is executedstop-> run is stopped with explicit reason (killed_global,killed_tenant,writes_disabled,tool_disabled)- decision is written to audit log
Example
A support agent started sending email.send in bulk due to a faulty scenario.
Operator enables writes_disabled for a specific tenant.
Result:
- new write actions are blocked immediately
- read actions can remain available
- logs contain
who/when/whyfor each block
Kill switch stops the incident directly in runtime loop instead of waiting for a new release.
In code it looks like this
The simplified scheme above shows the main flow. In practice, kill state is read centrally and cached for seconds.
Critical point: kill-check must be O(1) and use short cache (1-2 seconds), otherwise emergency stop reacts too late.
In production, the same kill-check is usually duplicated in tool gateway so no call can bypass runtime control.
Example kill-switch config:
kill_switch:
global_flag: agent_kill_global
tenant_flag_prefix: "agent_kill_tenant:"
writes_disabled_default: false
disabled_tools_key: agent_disabled_tools
cache_ttl_seconds: 2
while True:
action = planner.next(state)
action_key = make_action_key(action.name, action.args) # stable key for dedupe/audit
kill_state = kill_store.read(tenant_id=state.tenant_id)
decision = kill_policy.check(kill_state, action)
if decision.outcome == "stop":
audit.log(
run_id,
decision=decision.outcome,
reason=decision.reason,
scope=decision.scope,
action=action.name,
action_key=action_key,
actor=kill_state.last_updated_by,
)
return stop(decision.reason)
result = tool.execute(action.args)
audit.log(
run_id,
decision=decision.outcome,
reason=decision.reason,
scope=decision.scope,
action=action.name,
action_key=action_key,
result=result.status,
)
if result.final:
return result
Kill switch stops new actions. In-flight actions usually require a separate best-effort cancel mechanism.
How it looks during execution
Scenario 1: global stop
- Operator activates
global_kill=true. - Runtime forms next action and reads kill state.
- Policy returns
stop (reason=killed_global). - New actions are not executed.
- Logs contain scope=global and actor.
Scenario 2: tenant stop
tenant_kill=trueis activated for tenantt_42.- Runs for this tenant get
stop (reason=killed_tenant). - Other tenants continue working.
- Incident is localized without global stop.
Scenario 3: writes disabled
writes_disabled=trueis activated.- Read action passes with
allow. - Write action gets
stop (reason=writes_disabled). - System enters read-only degrade mode.
Common mistakes
- kill switch only in UI, but not in runtime/tool gateway
- one global stop without per-tenant mode
- missing writes-disabled mode
- long cache TTL (minutes instead of seconds)
- missing audit trail for operator actions
- missing tested incident runbook
Result: team has a "button", but no real emergency control.
Self-check
Quick kill-switch check before production launch:
Progress: 0/8
β Baseline governance controls are missing
Before production, you need at least access control, limits, audit logs, and an emergency stop.
FAQ
Q: What should be activated first: global stop or writes disabled?
A: Start with writes_disabled if incident is in write actions. Use global_kill when failure risk is broad and immediate full stop is required.
Q: Where exactly should kill switch be checked?
A: At minimum in two places: in runtime loop before next action and in tool gateway before tool execution.
Q: Can kill state be cached?
A: Yes, but briefly (seconds). During an incident, minute-long cache makes kill switch almost useless.
Q: How to implement kill flags technically?
A: Usually as global and tenant-scoped flags in Redis/config store, read by policy layer before each action.
Q: Does kill switch cancel already running actions?
A: Not always. It reliably blocks new actions. In-flight tasks need a separate best-effort cancel mechanism.
Q: Can kill switch replace RBAC and budgets?
A: No. Kill switch is an emergency stop mechanism. RBAC, limits, and budgets are needed for continuous control.
Where Kill Switch fits in the system
Kill switch is the emergency layer of Agent Governance. Together with RBAC, budgets, approval, and audit, it forms a complete production control system.
Related pages
Next on this topic:
- Agent Governance Overview β overall model for production agent control.
- Access Control (RBAC) β how to limit who can do what.
- Budget Controls β how to limit spend and runaway runs.
- Step limits β how to stop loops at runtime-loop level.
- Human approval β where manual confirmation is required before risky actions.