Problem-first intro
You have a task: “handle support tickets”, “triage alerts”, “enrich leads”, “review code”.
Someone suggests an agent. Someone else suggests a workflow.
In a demo, the agent wins. In production, the winner is usually: the thing you can operate.
The most expensive mistake we see is choosing an agent when you needed a workflow, and then adding governance until it’s basically a workflow anyway — except now it’s nondeterministic.
Quick decision (who should pick what)
- Pick a workflow when you can define steps, inputs, and success conditions. You’ll ship faster and sleep better.
- Pick an agent when the environment is messy (unknown docs, noisy tools) and you can’t enumerate all paths — but only if you’re willing to add budgets, permissions, and monitoring.
- If you’re not ready to build a control layer, don’t pick an agent. Pick a workflow.
Why people pick the wrong option in production
1) They confuse “flexible” with “reliable”
Agents are flexible. Reliability comes from:
- budgets
- validations
- idempotency
- approvals
- monitoring
Without those, agents are flexible at creating incidents.
2) They underestimate governance cost
The first time an agent loops, you add step limits. The first time it spams a tool, you add tool budgets. The first time it writes incorrectly, you add approvals.
At that point, you’ve built a workflow… but with extra variance.
3) They start with writes
Agents with write tools in week one are a predictable failure. Start read-only.
4) Workflows fail loudly, agents fail quietly
Workflow failure: a step errors. Agent failure: it “kind of works” but gets slower, costlier, and weirder.
That’s drift. Drift is a production problem.
Comparison table
| Criteria | Workflow | LLM Agent | What matters in prod | |---|---|---|---| | Determinism | High | Low/medium | Debuggability, replay | | Failure handling | Explicit | Emergent unless designed | Prevent thrash, stop reasons | | Observability | Straightforward | Requires intentional tracing | “What did it do?” | | Cost control | Predictable | Needs budgets + gating | No finance surprises | | Change safety | Standard deploy | Drift-prone | Canary, golden tasks | | Best for | Known paths | Unknown paths | Match system to reality |
Where this breaks in production
The failure modes differ:
Workflow breaks
- a step fails (timeout, 500)
- a queue backs up
- a schema changes
Fixes are mostly deterministic: retry policy, backoff, idempotency, rollbacks.
Agent breaks
- tool spam loops (search thrash)
- partial outages amplify (retries in loops)
- prompt injection steers tool calls
- token overuse truncates policy
- silent drift changes behavior
Agents break like control systems, because they are control systems.
Implementation example (real code)
The “agent vs workflow” decision isn’t about libraries. It’s about boundaries.
Here’s a minimal boundary you can use for either:
- tool gateway with allowlist
- budgets (steps/tool calls/time)
- stop reasons
from dataclasses import dataclass
from typing import Any
@dataclass(frozen=True)
class Budgets:
max_steps: int = 25
max_tool_calls: int = 12
class Stop(RuntimeError):
def __init__(self, reason: str):
super().__init__(reason)
self.reason = reason
class ToolGateway:
def __init__(self, *, allow: set[str]):
self.allow = allow
self.calls = 0
def call(self, tool: str, args: dict[str, Any], *, budgets: Budgets) -> Any:
self.calls += 1
if self.calls > budgets.max_tool_calls:
raise Stop("max_tool_calls")
if tool not in self.allow:
raise Stop(f"tool_denied:{tool}")
return tool_impl(tool, args=args) # (pseudo)
def workflow(task: str, *, budgets: Budgets) -> dict[str, Any]:
tools = ToolGateway(allow={"kb.read"})
try:
doc = tools.call("kb.read", {"q": task}, budgets=budgets)
return {"status": "ok", "answer": summarize(doc)} # (pseudo)
except Stop as e:
return {"status": "stopped", "stop_reason": e.reason}
def agent(task: str, *, budgets: Budgets) -> dict[str, Any]:
tools = ToolGateway(allow={"search.read", "kb.read", "http.get"})
try:
for _ in range(budgets.max_steps):
action = llm_decide(task) # (pseudo)
if action.kind == "final":
return {"status": "ok", "answer": action.final_answer}
obs = tools.call(action.name, action.args, budgets=budgets)
task = update(task, action, obs) # (pseudo)
return {"status": "stopped", "stop_reason": "max_steps"}
except Stop as e:
return {"status": "stopped", "stop_reason": e.reason}export class Stop extends Error {
constructor(reason) {
super(reason);
this.reason = reason;
}
}
export class ToolGateway {
constructor({ allow = [] } = {}) {
this.allow = new Set(allow);
this.calls = 0;
}
call(tool, args, { budgets }) {
this.calls += 1;
if (this.calls > budgets.maxToolCalls) throw new Stop("max_tool_calls");
if (!this.allow.has(tool)) throw new Stop("tool_denied:" + tool);
return toolImpl(tool, { args }); // (pseudo)
}
}Real failure case (incident-style, with numbers)
We saw a team replace a simple workflow with an agent “for flexibility”.
The workflow had fixed steps and predictable costs. The agent started calling search + browser tools because “maybe it helps”.
Impact in the first week:
- p95 latency: 1.9s → 9.7s
- spend: +$640 vs baseline
- and the worst part: incidents were harder to debug because behavior wasn’t deterministic
Fix:
- they moved 80% of the task back into a workflow
- the agent became a bounded “investigation step” behind strict budgets
- writes required approval
In production, hybrid usually wins: workflow for the known path, agent for the messy corner.
Migration path (A → B)
Workflow → Agent (safe-ish)
- keep the workflow as the default path
- add an agent only for ambiguous sub-tasks (bounded)
- enforce budgets + permissions + monitoring first
- canary rollout + golden tasks to catch drift
Agent → Workflow (when you regret it)
- log traces and identify the common path
- codify common path as deterministic steps
- keep the agent only for exceptions
- delete “agent as default” once confidence is high
Decision guide
- If you can write a state machine for it → pick a workflow.
- If you can’t, but the cost of being wrong is low → bounded agent might work.
- If the cost of being wrong is high → workflow + approvals, or don’t automate.
- If you can’t afford monitoring and governance → don’t ship an agent.
Trade-offs
- Workflows are less flexible.
- Agents require governance to be safe.
- Hybrid systems add complexity, but often reduce incident rate.
When NOT to use
- Don’t use agents for irreversible writes without approvals.
- Don’t use agents when success conditions are crisp and steps are known.
- Don’t use workflows when the input space is too open-ended (you’ll just rebuild an agent poorly).
Copy-paste checklist
- [ ] Can you enumerate steps? If yes, start with a workflow.
- [ ] If you use an agent, add budgets + tool gateway first.
- [ ] Start read-only; gate writes behind approvals.
- [ ] Return stop reasons; don’t timeout silently.
- [ ] Monitor tokens, tool calls, latency, stop reasons.
- [ ] Canary changes to models/prompts/tools; expect drift.
Safe default config snippet (JSON/YAML)
mode:
default: "workflow"
agent_for_exceptions: true
budgets:
max_steps: 25
max_tool_calls: 12
max_seconds: 60
tools:
allow: ["kb.read", "search.read", "http.get"]
writes:
require_approval: true
monitoring:
track: ["tool_calls_per_run", "tokens_per_request", "latency_p95", "stop_reason"]
FAQ (3–5)
Used by patterns
Related failures
Q: Can we use an agent without a tool gateway?
A: If there are no tools and no side effects, maybe. The moment tools exist, you need a gateway for policy and budgets.
Q: What’s the safest hybrid?
A: Workflow for the common path, bounded agent for investigations, approvals for writes.
Q: Why do agents drift more?
A: Model/prompt/tool changes shift decisions. Without golden tasks and canaries, regressions ship quietly.
Q: What’s the first metric to watch?
A: Tool calls/run. It moves before correctness complaints and before invoices.
Related pages (3–6 links)
- Foundations: Workflow vs agent (start here) · Planning vs reactive agents
- Failure: Tool spam loops · Silent agent drift
- Governance: Budget controls · Tool permissions
- Production stack: Production agent stack