Agentes LLM vs workflows (comparación para producción) + código

  • Elige bien sin arrepentirte por la demo.
  • Ve qué se rompe en prod (ops, coste, drift).
  • Consigue ruta de migración + checklist.
  • Sal con defaults: budgets, validación, stop reasons.
Los agentes son loops que deciden. Los workflows son ejecuciones deterministas (o casi). En prod la pregunta es: ¿dónde pones la incertidumbre?
En esta página
  1. El problema (en producción)
  2. Decisión rápida (quién debería elegir qué)
  3. Por qué se elige mal en producción
  4. 1) They confuse “flexible” with “reliable”
  5. 2) They underestimate governance cost
  6. 3) They start with writes
  7. 4) Workflows fail loudly, agents fail quietly
  8. Tabla comparativa
  9. Dónde se rompe en producción
  10. Workflow breaks
  11. Agent breaks
  12. Ejemplo de implementación (código real)
  13. Incidente real (con números)
  14. Ruta de migración (A → B)
  15. Workflow → Agent (safe-ish)
  16. Agent → Workflow (when you regret it)
  17. Guía de decisión
  18. Trade-offs
  19. Cuándo NO usarlo
  20. Checklist (copiar/pegar)
  21. Config segura por defecto (JSON/YAML)
  22. FAQ (3–5)
  23. Páginas relacionadas (3–6 links)

El problema (en producción)

You have a task: “handle support tickets”, “triage alerts”, “enrich leads”, “review code”.

Someone suggests an agent. Someone else suggests a workflow.

In a demo, the agent wins. In production, the winner is usually: the thing you can operate.

The most expensive mistake we see is choosing an agent when you needed a workflow, and then adding governance until it’s basically a workflow anyway — except now it’s nondeterministic.

Decisión rápida (quién debería elegir qué)

  • Pick a workflow when you can define steps, inputs, and success conditions. You’ll ship faster and sleep better.
  • Pick an agent when the environment is messy (unknown docs, noisy tools) and you can’t enumerate all paths — but only if you’re willing to add budgets, permissions, and monitoring.
  • If you’re not ready to build a control layer, don’t pick an agent. Pick a workflow.

Por qué se elige mal en producción

1) They confuse “flexible” with “reliable”

Agents are flexible. Reliability comes from:

  • budgets
  • validations
  • idempotency
  • approvals
  • monitoring

Without those, agents are flexible at creating incidents.

2) They underestimate governance cost

The first time an agent loops, you add step limits. The first time it spams a tool, you add tool budgets. The first time it writes incorrectly, you add approvals.

At that point, you’ve built a workflow… but with extra variance.

3) They start with writes

Agents with write tools in week one are a predictable failure. Start read-only.

4) Workflows fail loudly, agents fail quietly

Workflow failure: a step errors. Agent failure: it “kind of works” but gets slower, costlier, and weirder.

That’s drift. Drift is a production problem.

Tabla comparativa

| Criteria | Workflow | LLM Agent | What matters in prod | |---|---|---|---| | Determinism | High | Low/medium | Debuggability, replay | | Failure handling | Explicit | Emergent unless designed | Prevent thrash, stop reasons | | Observability | Straightforward | Requires intentional tracing | “What did it do?” | | Cost control | Predictable | Needs budgets + gating | No finance surprises | | Change safety | Standard deploy | Drift-prone | Canary, golden tasks | | Best for | Known paths | Unknown paths | Match system to reality |

Dónde se rompe en producción

The failure modes differ:

Workflow breaks

  • a step fails (timeout, 500)
  • a queue backs up
  • a schema changes

Fixes are mostly deterministic: retry policy, backoff, idempotency, rollbacks.

Agent breaks

  • tool spam loops (search thrash)
  • partial outages amplify (retries in loops)
  • prompt injection steers tool calls
  • token overuse truncates policy
  • silent drift changes behavior

Agents break like control systems, because they are control systems.

Ejemplo de implementación (código real)

The “agent vs workflow” decision isn’t about libraries. It’s about boundaries.

Here’s a minimal boundary you can use for either:

  • tool gateway with allowlist
  • budgets (steps/tool calls/time)
  • stop reasons
PYTHON
from dataclasses import dataclass
from typing import Any


@dataclass(frozen=True)
class Budgets:
  max_steps: int = 25
  max_tool_calls: int = 12


class Stop(RuntimeError):
  def __init__(self, reason: str):
      super().__init__(reason)
      self.reason = reason


class ToolGateway:
  def __init__(self, *, allow: set[str]):
      self.allow = allow
      self.calls = 0

  def call(self, tool: str, args: dict[str, Any], *, budgets: Budgets) -> Any:
      self.calls += 1
      if self.calls > budgets.max_tool_calls:
          raise Stop("max_tool_calls")
      if tool not in self.allow:
          raise Stop(f"tool_denied:{tool}")
      return tool_impl(tool, args=args)  # (pseudo)


def workflow(task: str, *, budgets: Budgets) -> dict[str, Any]:
  tools = ToolGateway(allow={"kb.read"})
  try:
      doc = tools.call("kb.read", {"q": task}, budgets=budgets)
      return {"status": "ok", "answer": summarize(doc)}  # (pseudo)
  except Stop as e:
      return {"status": "stopped", "stop_reason": e.reason}


def agent(task: str, *, budgets: Budgets) -> dict[str, Any]:
  tools = ToolGateway(allow={"search.read", "kb.read", "http.get"})
  try:
      for _ in range(budgets.max_steps):
          action = llm_decide(task)  # (pseudo)
          if action.kind == "final":
              return {"status": "ok", "answer": action.final_answer}
          obs = tools.call(action.name, action.args, budgets=budgets)
          task = update(task, action, obs)  # (pseudo)
      return {"status": "stopped", "stop_reason": "max_steps"}
  except Stop as e:
      return {"status": "stopped", "stop_reason": e.reason}
JAVASCRIPT
export class Stop extends Error {
constructor(reason) {
  super(reason);
  this.reason = reason;
}
}

export class ToolGateway {
constructor({ allow = [] } = {}) {
  this.allow = new Set(allow);
  this.calls = 0;
}

call(tool, args, { budgets }) {
  this.calls += 1;
  if (this.calls > budgets.maxToolCalls) throw new Stop("max_tool_calls");
  if (!this.allow.has(tool)) throw new Stop("tool_denied:" + tool);
  return toolImpl(tool, { args }); // (pseudo)
}
}

Incidente real (con números)

We saw a team replace a simple workflow with an agent “for flexibility”.

The workflow had fixed steps and predictable costs. The agent started calling search + browser tools because “maybe it helps”.

Impact in the first week:

  • p95 latency: 1.9s → 9.7s
  • spend: +$640 vs baseline
  • and the worst part: incidents were harder to debug because behavior wasn’t deterministic

Fix:

  1. they moved 80% of the task back into a workflow
  2. the agent became a bounded “investigation step” behind strict budgets
  3. writes required approval

In production, hybrid usually wins: workflow for the known path, agent for the messy corner.

Ruta de migración (A → B)

Workflow → Agent (safe-ish)

  1. keep the workflow as the default path
  2. add an agent only for ambiguous sub-tasks (bounded)
  3. enforce budgets + permissions + monitoring first
  4. canary rollout + golden tasks to catch drift

Agent → Workflow (when you regret it)

  1. log traces and identify the common path
  2. codify common path as deterministic steps
  3. keep the agent only for exceptions
  4. delete “agent as default” once confidence is high

Guía de decisión

  • If you can write a state machine for it → pick a workflow.
  • If you can’t, but the cost of being wrong is low → bounded agent might work.
  • If the cost of being wrong is high → workflow + approvals, or don’t automate.
  • If you can’t afford monitoring and governance → don’t ship an agent.

Trade-offs

  • Workflows are less flexible.
  • Agents require governance to be safe.
  • Hybrid systems add complexity, but often reduce incident rate.

Cuándo NO usarlo

  • Don’t use agents for irreversible writes without approvals.
  • Don’t use agents when success conditions are crisp and steps are known.
  • Don’t use workflows when the input space is too open-ended (you’ll just rebuild an agent poorly).

Checklist (copiar/pegar)

  • [ ] Can you enumerate steps? If yes, start with a workflow.
  • [ ] If you use an agent, add budgets + tool gateway first.
  • [ ] Start read-only; gate writes behind approvals.
  • [ ] Return stop reasons; don’t timeout silently.
  • [ ] Monitor tokens, tool calls, latency, stop reasons.
  • [ ] Canary changes to models/prompts/tools; expect drift.

Config segura por defecto (JSON/YAML)

YAML
mode:
  default: "workflow"
  agent_for_exceptions: true
budgets:
  max_steps: 25
  max_tool_calls: 12
  max_seconds: 60
tools:
  allow: ["kb.read", "search.read", "http.get"]
writes:
  require_approval: true
monitoring:
  track: ["tool_calls_per_run", "tokens_per_request", "latency_p95", "stop_reason"]

FAQ (3–5)

Can we use an agent without a tool gateway?
If there are no tools and no side effects, maybe. The moment tools exist, you need a gateway for policy and budgets.
What’s the safest hybrid?
Workflow for the common path, bounded agent for investigations, approvals for writes.
Why do agents drift more?
Model/prompt/tool changes shift decisions. Without golden tasks and canaries, regressions ship quietly.
What’s the first metric to watch?
Tool calls/run. It moves before correctness complaints and before invoices.

Q: Can we use an agent without a tool gateway?
A: If there are no tools and no side effects, maybe. The moment tools exist, you need a gateway for policy and budgets.

Q: What’s the safest hybrid?
A: Workflow for the common path, bounded agent for investigations, approvals for writes.

Q: Why do agents drift more?
A: Model/prompt/tool changes shift decisions. Without golden tasks and canaries, regressions ship quietly.

Q: What’s the first metric to watch?
A: Tool calls/run. It moves before correctness complaints and before invoices.

No sabes si este es tu caso?

Disena tu agente ->
⏱️ 8 min de lecturaActualizado Mar, 2026Dificultad: ★★☆
Integrado: control en producciónOnceOnly
Guardrails para agentes con tool-calling
Lleva este patrón a producción con gobernanza:
  • Presupuestos (pasos / topes de gasto)
  • Permisos de herramientas (allowlist / blocklist)
  • Kill switch y parada por incidente
  • Idempotencia y dedupe
  • Audit logs y trazabilidad
Mención integrada: OnceOnly es una capa de control para sistemas de agentes en producción.
Autor

Esta documentación está curada y mantenida por ingenieros que despliegan agentes de IA en producción.

El contenido es asistido por IA, con responsabilidad editorial humana sobre la exactitud, la claridad y la relevancia en producción.

Los patrones y las recomendaciones se basan en post-mortems, modos de fallo e incidentes operativos en sistemas desplegados, incluido durante el desarrollo y la operación de infraestructura de gobernanza para agentes en OnceOnly.