Quick take: Single-step “agents” (one model call → execute → done) have no place for validation, no recovery loop, and no stop reasons. They fail because production systems are noisy. If you have tools or side effects, you need a bounded loop + governance.
You'll learn: When single-step is actually fine • The minimal safe routing rule • A bounded loop interface • Stop reasons • A real incident smell test
Single-step: validation has nowhere to live • recovery happens in “clever prompts” • writes happen too early
Looped runner: budgets • tool gateway • stop reasons • safe-mode
Impact: fewer incidents + debuggable failures instead of “execute & pray”
Problem-first intro
Somebody says: “we built an agent”.
The code is:
- Call the model once
- Parse a tool call
- Execute it
- Return whatever happened
That’s not an agent. That’s a function call with unpredictable arguments.
In a demo it feels fast. In production it fails for the reason you built agents in the first place: real systems are noisy, and you need feedback + control.
Why this fails in production
1) No feedback loop = no recovery
Production is full of timeouts, partial responses, 429s, stale data, and schema drift. A single-step design has nowhere to put recovery logic, so teams push “recovery” into prompts and then execute it blindly.
2) Budgets and stop reasons get bolted on too late
Teams say: “it can’t loop, so we don’t need budgets.”
Then they add retries in tools, retries in the model call, and a second tool call “just in case”.
Congrats, you reinvented loops without governance.
3) Tool output gets ignored or misused
If you only call a tool once, what do you do with the output? Usually you just return it. That means no validation, no invariants, and no “did we actually solve the task?” check.
4) Writes become a coin flip
In a single-step design, the model can propose a write immediately. There’s no “read first, write later” policy. The blast radius arrives early.
When single-step is enough (yes, sometimes)
Single-step is fine when all of this is true:
- No tools (or tools are strictly read-only)
- No side effects (no state changes)
- Output is used as text, not as a command
- You can validate output with a strict schema (or you don’t need to)
Decision framework: single-step is OK only if all are true:
- ✅ Read-only (no side effects)
- ✅ Strongly typed output (or no tools)
- ✅ Failure is cheap (low blast radius)
- ✅ No retries/recovery loop needed
If any of those are false, route to a looped runner.
Hard routing rule (the one that saves you)
If the next step can cause side effects, a single-step path is not allowed.
if action.has_side_effects:
run_looped_runner()
else:
run_single_step()
This sounds obvious. It’s not obvious when the demo is working and nobody has been paged yet.
Migration path (single-step → loop)
This is what teams usually ship, and why it breaks:
# v1: single-step (fast, unsafe)
result = tool(llm_decide(task)) # damage can happen before validation
# v2: add validation (still unsafe if the tool already ran)
result = tool(llm_decide(task))
if not valid(result):
raise RuntimeError("too late: side effect already happened")
# v3: bounded loop (safe enough to operate)
for step in range(max_steps):
action = llm_decide(state)
if action.kind == "tool":
obs = tool_gateway.call(action.name, action.args) # policy + budgets
state = update(state, obs)
else:
return action.final_answer
Implementation example (real code)
This pattern keeps single-step where it belongs (safe, read-only), and routes everything else to a bounded loop runner.
from __future__ import annotations
from dataclasses import dataclass
from typing import Any, Literal
@dataclass(frozen=True)
class Budgets:
max_steps: int = 25
max_tool_calls: int = 12
max_seconds: int = 60
class Stopped(RuntimeError):
def __init__(self, stop_reason: str):
super().__init__(stop_reason)
self.stop_reason = stop_reason
def is_side_effecting(action: dict[str, Any]) -> bool:
# Production: decide side-effect class in code, not by prompt vibes.
return action.get("kind") in {"write", "payment", "email", "ticket_close"}
def run_single_step(task: str, *, llm) -> dict[str, Any]:
"""
Safe single-step: no tools, no writes.
This is a completion, not an agent.
"""
text = llm.text({"task": task, "style": "direct"}) # (pseudo)
return {"status": "ok", "stop_reason": "single_step", "answer": text}
def run_looped(task: str, *, budgets: Budgets, runner) -> dict[str, Any]:
"""
Delegate to a bounded runner that has:
- tool gateway
- output validation
- stop reasons
"""
return runner.run(task, budgets=budgets) # (pseudo)
def route(task: str, *, llm, budgets: Budgets, runner) -> dict[str, Any]:
# First decision is read-only: are we about to do anything with side effects?
action = llm.json(
{
"task": task,
"rule": "Return JSON {kind: 'read_only'|'side_effects'} and nothing else.",
"examples": [{"task": "Summarize this text", "kind": "read_only"}, {"task": "Close ticket #123", "kind": "side_effects"}],
}
) # (pseudo)
if action.get("kind") == "side_effects":
return run_looped(task, budgets=budgets, runner=runner)
return run_single_step(task, llm=llm)export class Stopped extends Error {
constructor(stopReason) {
super(stopReason);
this.stop_reason = stopReason;
}
}
export function runSingleStep(task, { llm }) {
// Safe single-step: no tools, no writes.
return llm.text({ task, style: "direct" }).then((text) => ({ status: "ok", stop_reason: "single_step", answer: text })); // (pseudo)
}
export function runLooped(task, { budgets, runner }) {
// Delegate to a bounded runner with tool gateway + stop reasons.
return runner.run(task, { budgets }); // (pseudo)
}
export async function route(task, { llm, budgets, runner }) {
const action = await llm.json({
task,
rule: "Return JSON {kind: 'read_only'|'side_effects'} and nothing else.",
examples: [
{ task: "Summarize this text", kind: "read_only" },
{ task: "Close ticket #123", kind: "side_effects" },
],
}); // (pseudo)
if (action.kind === "side_effects") return await runLooped(task, { budgets, runner });
return await runSingleStep(task, { llm });
}This doesn’t look “agentic”. It looks operable. That’s the point.
Failure evidence (what it looks like when it breaks)
Single-step failures show up as “one bad decision with immediate blast radius”.
A trace that explains the incident in 5 lines:
{"run_id":"run_44a1","step":0,"event":"tool_call","tool":"ticket.close","args_hash":"b5d0aa","decision":"allow"}
{"run_id":"run_44a1","step":0,"event":"tool_result","tool":"ticket.close","ok":true}
{"run_id":"run_44a1","step":0,"event":"stop","reason":"success","note":"single-step"}
If that makes you uncomfortable, good.
Example failure case (composite)
🚨 Incident: Premature ticket closure
System: Single-step “close resolved tickets” agent
Duration: under 1 hour
Impact: 18 tickets incorrectly closed
What happened
The agent called ticket.close immediately based on a snippet. It misread sarcasm as “resolved”.
The worst part: nobody could explain why. There was no loop state, no stop reasons that mattered, and no opportunity to validate.
Fix
- Route side-effecting actions to a looped runner
- Tool gateway policy + audit logs
- Approvals for
ticket.close
Trade-offs
- A loop is more code than one model call.
- More steps can mean more latency (budgets help).
- You need observability (but you needed it anyway).
When NOT to use
- If you truly have one deterministic transform, don’t call it an agent.
- If your task needs tool feedback and recovery, single-step will be fragile.
- If you can’t log traces and stop reasons, fix observability first.
Copy-paste checklist
- [ ] If you have side effects, you need a looped runner
- [ ] Route side-effecting tasks away from single-step
- [ ] Add budgets (steps, tool calls, seconds)
- [ ] Use a tool gateway (default-deny allowlist)
- [ ] Validate tool outputs before acting
- [ ] Return stop reasons (and log them)
- [ ] Require approvals for writes
Safe default config
routing:
allow_single_step_only_when: "read_only"
budgets:
max_steps: 25
max_tool_calls: 12
max_seconds: 60
tools:
allow: ["search.read", "kb.read", "http.get"]
writes:
require_approval: true
stop_reasons:
return_to_user: true
FAQ
Related pages
Production takeaway
What breaks without this
- ❌ Writes happen before validation
- ❌ “Recovery” lives in prompts and tool retries
- ❌ No stop reasons that explain behavior
What works with this
- ✅ Side effects route to a bounded runner
- ✅ Budgets + tool gateway keep runs controllable
- ✅ Failures are explainable (stop reasons + traces)
Minimum to ship
- Routing rule (read-only can be single-step; side effects can’t)
- Bounded runner (budgets + stop reasons)
- Tool gateway (deny by default)
- Validation layer (before writes)