CrewAI vs LangGraph (comparación para producción) + código

  • Elige bien sin arrepentirte por la demo.
  • Ve qué se rompe en prod (ops, coste, drift).
  • Consigue ruta de migración + checklist.
  • Sal con defaults: budgets, validación, stop reasons.
CrewAI optimiza orquestación multi‑agente por roles. LangGraph optimiza máquinas de estado explícitas. Qué rompe en prod, qué se opera mejor y cómo migrar.
En esta página
  1. El problema (en producción)
  2. Decisión rápida (quién debería elegir qué)
  3. Por qué se elige mal en producción
  4. 1) They pick based on “demo vibes”
  5. 2) They confuse “graph” with “safe”
  6. 3) They don’t define state
  7. Tabla comparativa
  8. Dónde se rompe en producción
  9. CrewAI-style multi-agent breaks
  10. LangGraph-style flow breaks
  11. Ejemplo de implementación (código real)
  12. Incidente real (con números)
  13. Ruta de migración (A → B)
  14. CrewAI → LangGraph (common path)
  15. LangGraph → CrewAI (when roles matter)
  16. Guía de decisión
  17. Trade-offs
  18. Cuándo NO usarlo
  19. Checklist (copiar/pegar)
  20. Config segura por defecto (JSON/YAML)
  21. FAQ (3–5)
  22. Páginas relacionadas (3–6 links)

El problema (en producción)

You want to ship an agent system that does real work, not a weekend demo.

Someone on the team says: “Let’s do multi-agent with CrewAI.” Someone else says: “We should use LangGraph; graphs are easier to reason about.”

Both can work. Both can also produce the same outcome in production: a slow, expensive, hard-to-debug system if you don’t build a control layer.

The question isn’t “which is cooler”. The question is: which one makes failure modes obvious and governable.

Decisión rápida (quién debería elegir qué)

  • Pick CrewAI if you explicitly want role-based multi-agent collaboration and you can invest in orchestration + monitoring to prevent deadlocks/thrash.
  • Pick LangGraph if you want explicit state + deterministic-ish transitions you can test, replay, and roll back without guessing what the model “meant”.
  • If you don’t have strong budgets/permissions/monitoring yet, LangGraph-style explicit flow usually hurts less.

Por qué se elige mal en producción

1) They pick based on “demo vibes”

Multi-agent role play looks impressive. It also adds:

  • coordination overhead
  • waiting states
  • circular dependencies
  • more tool calls

If you’re not ready to instrument it, it’ll fail quietly.

2) They confuse “graph” with “safe”

A graph is not governance. It’s a place to put governance.

You still need:

  • budgets
  • permissions
  • validation
  • approvals for writes
  • stop reasons

3) They don’t define state

If you can’t write down:

  • current state
  • allowed transitions
  • stop conditions

…your system will drift into “agent chooses everything”, which is just a fancy way to say “debugging is vibes”.

Tabla comparativa

| Criterion | CrewAI | LangGraph | What matters in prod | |---|---|---| | Primary abstraction | Roles + collaboration | State + transitions | Debuggability | | Determinism | Lower | Higher | Replay + tests | | Failure handling | Emergent unless designed | Easier to encode | Stop reasons | | Observability | You must add it | You must add it | “What did it do?” | | Loop/Deadlock risk | Higher | Medium | On-call load | | Migration friendliness | Medium | High | Canaries/rollback |

Dónde se rompe en producción

CrewAI-style multi-agent breaks

  • agents wait on each other (deadlocks)
  • roles “disagree” and loop
  • more context passed around → token overuse
  • tool spam (agents “helpfully” re-search)

LangGraph-style flow breaks

  • state machine grows complex
  • devs cram “just let the model decide” nodes everywhere
  • missing validation on edges turns graphs into “unsafe pipes”

The common failure is the same: missing governance.

Ejemplo de implementación (código real)

The production trick is to separate:

  1. your orchestration framework
  2. your control layer (which should survive framework changes)

This is a framework-agnostic tool gateway + budget guard you can wrap around either approach.

PYTHON
from dataclasses import dataclass
from typing import Any, Callable
import time


@dataclass(frozen=True)
class Budgets:
  max_steps: int = 40
  max_tool_calls: int = 20
  max_seconds: int = 120


class Stop(RuntimeError):
  def __init__(self, reason: str):
      super().__init__(reason)
      self.reason = reason


class ToolGateway:
  def __init__(self, *, allow: set[str], impls: dict[str, Callable[..., Any]]):
      self.allow = allow
      self.impls = impls
      self.calls = 0

  def call(self, tool: str, args: dict[str, Any], *, budgets: Budgets) -> Any:
      self.calls += 1
      if self.calls > budgets.max_tool_calls:
          raise Stop("max_tool_calls")
      if tool not in self.allow:
          raise Stop(f"tool_denied:{tool}")
      fn = self.impls.get(tool)
      if not fn:
          raise Stop(f"tool_missing:{tool}")
      return fn(**args)


def run_framework(orchestration_fn, *, budgets: Budgets, tools: ToolGateway) -> dict[str, Any]:
  started = time.time()
  for step in range(budgets.max_steps):
      if time.time() - started > budgets.max_seconds:
          return {"status": "stopped", "stop_reason": "max_seconds"}
      try:
          # orchestration_fn must call tools via ToolGateway only.
          out = orchestration_fn(step=step, tools=tools)  # (pseudo)
          if out.get("done"):
              return {"status": "ok", "result": out.get("result")}
      except Stop as e:
          return {"status": "stopped", "stop_reason": e.reason}
  return {"status": "stopped", "stop_reason": "max_steps"}
JAVASCRIPT
export class Stop extends Error {
constructor(reason) {
  super(reason);
  this.reason = reason;
}
}

export class ToolGateway {
constructor({ allow = [], impls = {} } = {}) {
  this.allow = new Set(allow);
  this.impls = impls;
  this.calls = 0;
}

call(tool, args, { budgets }) {
  this.calls += 1;
  if (this.calls > budgets.maxToolCalls) throw new Stop("max_tool_calls");
  if (!this.allow.has(tool)) throw new Stop("tool_denied:" + tool);
  const fn = this.impls[tool];
  if (!fn) throw new Stop("tool_missing:" + tool);
  return fn(args);
}
}

export function runFramework(orchestrationFn, { budgets, tools }) {
const started = Date.now();
for (let step = 0; step < budgets.maxSteps; step++) {
  if ((Date.now() - started) / 1000 > budgets.maxSeconds) return { status: "stopped", stop_reason: "max_seconds" };
  try {
    const out = orchestrationFn({ step, tools }); // (pseudo)
    if (out && out.done) return { status: "ok", result: out.result };
  } catch (e) {
    if (e instanceof Stop) return { status: "stopped", stop_reason: e.reason };
    throw e;
  }
}
return { status: "stopped", stop_reason: "max_steps" };
}

Incidente real (con números)

We saw a multi-agent system shipped for “support triage”. It was role-based, and it looked great in a demo.

In production:

  • one role started “double-checking” by re-searching
  • another role waited for the first role’s output

Impact over a day:

  • tool calls/run: 6 → 24
  • p95 latency: 4.1s → 21.6s
  • spend: +$530 vs baseline
  • on-call time: ~2 hours to identify that the issue was “agent coordination”, not an external outage

Fix:

  1. explicit step limits + repeat detection
  2. tool gateway dedupe for repeated search calls
  3. degrade mode during search instability

The framework wasn’t the villain. Lack of control was.

Ruta de migración (A → B)

CrewAI → LangGraph (common path)

  1. log real runs and identify the “happy path”
  2. encode that path as explicit graph states
  3. keep a bounded “agentic” branch for edge cases
  4. keep the same tool gateway + budgets (don’t rewrite governance)

LangGraph → CrewAI (when roles matter)

  1. keep the graph as the orchestrator
  2. swap specific nodes to call “role agents”
  3. enforce budgets and stop reasons at the outer loop

Guía de decisión

  • If you need explicit state and replay → pick LangGraph-style graphs.
  • If you need collaboration patterns (reviewer/critic/planner) → CrewAI can fit, but budget it hard.
  • If you’re early and under-instrumented → pick the approach that’s easiest to test and trace.

Trade-offs

  • Multi-agent can improve quality on complex tasks, but increases coordination failures.
  • Graphs improve debuggability, but the state machine becomes real code you must maintain.
  • Either way, the control layer is non-optional in production.

Cuándo NO usarlo

  • Don’t ship multi-agent without timeouts, leases, and stop reasons.
  • Don’t build graphs that “just call the model to decide everything” — you lose the point of a graph.
  • Don’t pick a framework first. Pick the failure modes you can tolerate.

Checklist (copiar/pegar)

  • [ ] Keep governance framework-agnostic (budgets + tool gateway)
  • [ ] Add stop reasons and surface them to users
  • [ ] Add repeat detection + tool dedupe
  • [ ] Start read-only; gate writes behind approvals
  • [ ] Canary changes; expect drift
  • [ ] Test replay on golden tasks

Config segura por defecto (JSON/YAML)

YAML
budgets:
  max_steps: 40
  max_tool_calls: 20
  max_seconds: 120
tools:
  allow: ["search.read", "kb.read", "http.get"]
writes:
  require_approval: true
monitoring:
  track: ["tool_calls_per_run", "latency_p95", "stop_reason"]

FAQ (3–5)

Is multi-agent always better?
No. It can improve quality, but it increases coordination failures. You pay for it in observability and governance.
Are graphs only for workflows?
No. Graphs can orchestrate agents too. The value is explicit state and testability.
What’s the first guardrail to add?
Budgets (steps/tool calls/time) and a tool gateway with a default-deny allowlist.
Can we migrate without rewriting everything?
Yes if you keep governance outside the framework: budget guard + tool gateway + logging.

Q: Is multi-agent always better?
A: No. It can improve quality, but it increases coordination failures. You pay for it in observability and governance.

Q: Are graphs only for workflows?
A: No. Graphs can orchestrate agents too. The value is explicit state and testability.

Q: What’s the first guardrail to add?
A: Budgets (steps/tool calls/time) and a tool gateway with a default-deny allowlist.

Q: Can we migrate without rewriting everything?
A: Yes if you keep governance outside the framework: budget guard + tool gateway + logging.

No sabes si este es tu caso?

Disena tu agente ->
⏱️ 8 min de lecturaActualizado Mar, 2026Dificultad: ★★☆
Integrado: control en producciónOnceOnly
Guardrails para agentes con tool-calling
Lleva este patrón a producción con gobernanza:
  • Presupuestos (pasos / topes de gasto)
  • Permisos de herramientas (allowlist / blocklist)
  • Kill switch y parada por incidente
  • Idempotencia y dedupe
  • Audit logs y trazabilidad
Mención integrada: OnceOnly es una capa de control para sistemas de agentes en producción.
Autor

Esta documentación está curada y mantenida por ingenieros que despliegan agentes de IA en producción.

El contenido es asistido por IA, con responsabilidad editorial humana sobre la exactitud, la claridad y la relevancia en producción.

Los patrones y las recomendaciones se basan en post-mortems, modos de fallo e incidentes operativos en sistemas desplegados, incluido durante el desarrollo y la operación de infraestructura de gobernanza para agentes en OnceOnly.