Idea In 30 Seconds
Budget controls are runtime limits that stop a run when the agent goes beyond limits for steps, time, tool calls, or money.
When you need it: when the agent calls external tools, can retry actions, and operates with a real budget in production.
Problem
Without budgets, a production agent can easily enter a loop: more steps, more tool calls, more spend. In demos this is not visible because load is low. In production this behavior quickly turns into an incident.
One unstable tool response often starts a chain: retry -> new tool call -> more tokens -> another retry. If there is no hard runtime boundary, the run continues longer and costs more than planned.
Analogy: it is like taking a ride with no limit on a company taxi account. While the ride is short, everything looks fine. When the route drags on, costs grow silently until charge time.
Solution
The solution is to move budget checks into the runtime policy layer.
Each agent step is checked against four limits: max_steps, max_seconds, max_tool_calls, max_usd.
The policy layer returns a technical decision: allow or stop with an explicit reason (max_steps, max_seconds, max_tool_calls, max_usd).
This decision is made at every step, not only at the end of the run.
This is a separate system layer, not part of the prompt or model logic.
Budget Controls != Rate Limiting
These are different control layers:
- Rate limiting limits request frequency to a system or tool.
- Budget controls limit the total spend of a single run.
One without the other does not work:
- without budget controls, one run can become too expensive
- without rate limiting, many runs can overload dependencies
Example:
- rate limit: no more than 10 requests per minute to
search_api - budget controls: no more than
max_tool_calls=12andmax_usd=1.00for one run
Budget Control Metrics
These metrics and signals work together at every agent step.
| Metric | What it controls | Key mechanics | Why |
|---|---|---|---|
| Step budget | Run length in steps | max_stepsstop reason max_steps | Stops loops before costs grow |
| Time budget | Run duration in seconds | max_secondswall-clock timeout | Prevents long runs from blocking resources |
| Tool-call budget | Number of tool calls | max_tool_callsper-run call cap | Limits tool spam and retry chains |
| Spend budget | Money spent per run | max_usdusage accounting | Sets a hard financial boundary |
| Observability (budget) | Visibility into budget decisions | audit logs alerts (Slack / PagerDuty) | Does not directly limit actions, but lets you quickly find and explain limit overruns |
Example alert:
Slack: π Agent Support-Bot hit max_usd limit ($100). Run stopped at step 12.
How It Looks In Architecture
The budget policy layer sits between runtime and action execution and checks limits before every step.
Each decision (allow or stop) is recorded in audit log.
Every agent step goes through this flow before execution: runtime does not execute actions directly β first budget check, then execution.
Flow summary:
- Runtime updates usage (steps, time, tool calls, spend)
- Budget policy layer checks limits
allow-> next step executesstop-> stop reason and partial response are returned- both decisions are written to audit log
Example
A support agent handles a request and repeatedly calls refund.lookup because of an unstable API.
With budget controls:
max_tool_calls = 8max_seconds = 45max_usd = 1.00
-> run is stopped when the limit is reached, not after uncontrolled spend growth.
Budget controls stop the incident at execution level instead of relying on the model to stop itself.
In Code It Looks Like This
In the simplified diagram above, you see the main control flow. In practice, budget checks are often executed twice: before step execution and after actual usage is updated. The step counter is updated before the check so the current step is included in budget accounting.
usage.update(step=1)
decision = budget.check(usage, limits)
if not decision.allowed:
audit.log(run_id, decision.outcome, reason=decision.reason, usage=usage.snapshot())
alerts.notify_if_needed(run_id, decision.reason, usage.snapshot())
return stop(decision.reason)
result = tool.execute(args)
usage.update(tool_call=1, usd=result.cost_usd)
decision = budget.check(usage, limits)
if not decision.allowed:
audit.log(run_id, decision.outcome, reason=decision.reason, usage=usage.snapshot())
alerts.notify_if_needed(run_id, decision.reason, usage.snapshot())
return stop(decision.reason)
audit.log(run_id, decision.outcome, reason=decision.reason, usage=usage.snapshot(), result=result.status)
return result
How It Looks During Execution
Scenario 1: limit reached (stop)
1. Runtime completes step 9 and updates actual usage.
2. Budget policy sees that max_usd is exceeded.
3. Decision: stop (max_usd).
4. Audit: decision=stop, reason=max_usd, step=9.
5. User gets a partial response with stop reason.
---
Scenario 2: limit not reached (allow)
1. Runtime starts step 4 and updates usage.
2. Budget policy checks limits: everything is within bounds.
3. Decision: allow.
4. Tool call is executed.
5. Audit: decision=allow, usage updated, result returned.
Common Mistakes
- limiting only tokens and ignoring
max_seconds,max_tool_calls, andmax_usd - checking budgets only "at the end", not before each step
- not returning explicit stop reason to user
- spreading budget logic across runtime, tools, and UI
- not logging budget decisions (
allow/stop) in audit trail - no alerting for
max_usdandmax_tool_calls
As a result, the system looks stable but quickly loses predictability under load.
Self-Check
Quick budget-controls check before production launch:
Progress: 0/8
β Baseline governance controls are missing
Before production, you need at least access control, limits, audit logs, and an emergency stop.
FAQ
Q: Is token budget alone enough?
A: No. Tools can cost more than tokens. Minimum: max_steps, max_seconds, max_tool_calls, max_usd.
Q: What should be implemented first: max_steps or max_usd?
A: Start with max_steps and max_tool_calls to stop loops immediately. Then add max_usd for a financial boundary.
Q: What should users get on budget stop?
A: Partial response + explicit stop reason + short next-step hint (narrow the request or retry with a higher budget).
Q: Do we need per-tenant budgets?
A: Yes. Different plans or teams need different limits, but the check mechanics should stay the same.
Q: Do budget controls replace rate limiting?
A: No. Rate limiting protects dependencies from spikes, while budget controls protect a run from runaway spend.
Where Budget Controls Fit In The Whole System
Budget controls are one of Agent Governance layers. Together with RBAC, execution limits, approval, and audit, they form a unified execution-control system.
Related Pages
Next on this topic:
- Agent Governance Overview β overall production control model for agents.
- Access Control (RBAC) β how to constrain who can call what.
- Rate Limiting For Agents β how to protect tools and APIs from traffic spikes.
- Step Limits β how to stop infinite loops by step count.
- Audit Logs For Agents β how to debug incidents by stop reasons and policy-layer decisions.