PydanticAI vs LangChain Agents (comparación para producción) + código

  • Elige bien sin arrepentirte por la demo.
  • Ve qué se rompe en prod (ops, coste, drift).
  • Consigue ruta de migración + checklist.
  • Sal con defaults: budgets, validación, stop reasons.
PydanticAI empuja I/O tipado y validación. LangChain Agents empuja orquestación flexible. En prod: validación, replay y control de herramientas.
En esta página
  1. El problema (en producción)
  2. Decisión rápida (quién debería elegir qué)
  3. Por qué se elige mal en producción
  4. 1) They think a framework replaces governance
  5. 2) They treat structured outputs as “nice to have”
  6. 3) They over-index on integration count
  7. Tabla comparativa
  8. Dónde se rompe en producción
  9. Typed-first breaks
  10. Flexible breaks
  11. Ejemplo de implementación (código real)
  12. Incidente real (con números)
  13. Ruta de migración (A → B)
  14. Flexible → typed-first
  15. Typed-first → flexible (when you need it)
  16. Guía de decisión
  17. Trade-offs
  18. Cuándo NO usarlo
  19. Checklist (copiar/pegar)
  20. Config segura por defecto (JSON/YAML)
  21. FAQ (3–5)
  22. Páginas relacionadas (3–6 links)

El problema (en producción)

At some point you’ll hit the same production problem: the model output shape matters more than the model prose.

If you’re calling tools, parsing JSON, and triggering side effects, you need:

  • schema validation
  • invariants
  • fail-closed behavior

That’s where “typed agent frameworks” are attractive. And where “flexible agent frameworks” can either help or hurt, depending on how much discipline your team has.

Decisión rápida (quién debería elegir qué)

  • Pick PydanticAI-style typed outputs if your system is tool-heavy and you want validation to be the default, not an afterthought.
  • Pick LangChain agents if you need flexibility across integrations and you’re willing to enforce schemas and governance yourself.
  • If you don’t validate outputs, it doesn’t matter which you pick — you’ll ship silent failures.

Por qué se elige mal en producción

1) They think a framework replaces governance

No framework replaces:

  • budgets
  • tool permissions
  • monitoring
  • approvals for writes

2) They treat structured outputs as “nice to have”

In prod, structured outputs are how you prevent:

  • tool response corruption turning into actions
  • prompt injection steering tool calls
  • “close enough JSON” becoming “close enough incident”

3) They over-index on integration count

“It integrates with everything” isn’t a production plan. If your tool gateway is unsafe, more integrations just means more blast radius.

Tabla comparativa

| Criterion | PydanticAI (typed-first) | LangChain agents (flexible) | What matters in prod | |---|---|---|---| | Default output validation | Strong | Depends on you | Fail closed | | Integration surface | Smaller | Larger | Blast radius | | Debuggability | Better if typed | Better if instrumented | Traces | | Failure handling | Explicit if enforced | Emergent if loose | Stop reasons | | Best for | Tool-heavy systems | Rapid integration | Team discipline |

Dónde se rompe en producción

Typed-first breaks

  • you still have to maintain schemas
  • you can over-constrain and reject useful outputs
  • teams misuse typing as “security” (it isn’t)

Flexible breaks

  • silent parse errors
  • “best effort” JSON coercion
  • tool outputs treated as instructions
  • drift changes output shapes without tests

Ejemplo de implementación (código real)

No matter what framework you use, put a strict validator between the model and side effects.

This shows a minimal typed decision object with fail-closed parsing.

PYTHON
from dataclasses import dataclass
from typing import Any


@dataclass(frozen=True)
class Decision:
  kind: str  # "final" | "tool"
  tool: str | None
  args: dict[str, Any] | None
  answer: str | None


class InvalidDecision(RuntimeError):
  pass


def validate_decision(obj: Any) -> Decision:
  if not isinstance(obj, dict):
      raise InvalidDecision("expected object")
  kind = obj.get("kind")
  if kind not in {"final", "tool"}:
      raise InvalidDecision("invalid kind")
  if kind == "final":
      ans = obj.get("answer")
      if not isinstance(ans, str) or not ans.strip():
          raise InvalidDecision("missing answer")
      return Decision(kind="final", tool=None, args=None, answer=ans)
  tool = obj.get("tool")
  args = obj.get("args")
  if not isinstance(tool, str):
      raise InvalidDecision("missing tool")
  if not isinstance(args, dict):
      raise InvalidDecision("missing args")
  return Decision(kind="tool", tool=tool, args=args, answer=None)
JAVASCRIPT
export class InvalidDecision extends Error {}

export function validateDecision(obj) {
if (!obj || typeof obj !== "object") throw new InvalidDecision("expected object");
const kind = obj.kind;
if (kind !== "final" && kind !== "tool") throw new InvalidDecision("invalid kind");

if (kind === "final") {
  if (typeof obj.answer !== "string" || !obj.answer.trim()) throw new InvalidDecision("missing answer");
  return { kind: "final", answer: obj.answer };
}

if (typeof obj.tool !== "string") throw new InvalidDecision("missing tool");
if (!obj.args || typeof obj.args !== "object") throw new InvalidDecision("missing args");
return { kind: "tool", tool: obj.tool, args: obj.args };
}

Incidente real (con números)

We saw a team ship a flexible agent that parsed “tool calls” with best-effort JSON extraction.

During a partial outage, tool output included an HTML error page. The model copied part of it into the “args”. The parser coerced it into a dict.

Impact:

  • 17 runs wrote garbage data into a queue
  • downstream workers crashed for ~25 minutes
  • on-call spent ~2 hours tracing the root cause because logs only had the final answer

Fix:

  1. strict parsing + schema validation for decisions and tool outputs
  2. fail closed before any write
  3. monitoring for invalid_decision_rate

Typed outputs didn’t solve this alone — strict validation did.

Ruta de migración (A → B)

Flexible → typed-first

  1. add schema validation at the boundary (model output + tool output)
  2. define a small decision schema (tool vs final)
  3. gradually type the high-risk parts (writes) first

Typed-first → flexible (when you need it)

  1. keep typed boundaries for actions and tools
  2. allow free-form text only inside “analysis” fields that never trigger side effects

Guía de decisión

  • If your system does writes → prioritize typed/validated boundaries.
  • If you’re doing experiments → flexibility is fine, but keep budgets and logging.
  • If you’re multi-tenant → strict validation is non-negotiable.

Trade-offs

  • Validation rejects some outputs. That’s good. It forces you to handle the failure path.
  • Typing adds maintenance overhead.
  • Flexibility can ship faster, but it ships more production surprises too.

Cuándo NO usarlo

  • Don’t rely on typing as security. You still need permissions and approvals.
  • Don’t use best-effort parsing for tool calls that trigger writes.
  • Don’t skip monitoring. Validation failures are a metric, not a shame.

Checklist (copiar/pegar)

  • [ ] Validate model decisions (schema) before acting
  • [ ] Validate tool outputs (schema + invariants)
  • [ ] Fail closed for writes
  • [ ] Budgets + stop reasons
  • [ ] Audit logs for tool calls
  • [ ] Canary changes; drift is real

Config segura por defecto (JSON/YAML)

YAML
validation:
  model_decision:
    fail_closed: true
    schema: "Decision(kind, tool?, args?, answer?)"
  tool_output:
    fail_closed: true
    max_chars: 200000
budgets:
  max_steps: 25
  max_tool_calls: 12
monitoring:
  track: ["invalid_decision_rate", "tool_output_invalid_rate", "stop_reason"]

FAQ (3–5)

Does typing guarantee correctness?
No. It guarantees shape. You still need invariants, permissions, and safe-mode behavior.
Is LangChain ‘unsafe’?
No. It’s flexible. Safety comes from how you enforce boundaries: budgets, validation, and a tool gateway.
What should we type first?
Anything that triggers writes or money: tool calls, approvals, budget policy outputs.
Can strict validation hurt completion rate?
Yes. That’s usually the point: stop guessing and handle failure paths explicitly.

Q: Does typing guarantee correctness?
A: No. It guarantees shape. You still need invariants, permissions, and safe-mode behavior.

Q: Is LangChain ‘unsafe’?
A: No. It’s flexible. Safety comes from how you enforce boundaries: budgets, validation, and a tool gateway.

Q: What should we type first?
A: Anything that triggers writes or money: tool calls, approvals, budget policy outputs.

Q: Can strict validation hurt completion rate?
A: Yes. That’s usually the point: stop guessing and handle failure paths explicitly.

No sabes si este es tu caso?

Disena tu agente ->
⏱️ 7 min de lecturaActualizado Mar, 2026Dificultad: ★★☆
Integrado: control en producciónOnceOnly
Guardrails para agentes con tool-calling
Lleva este patrón a producción con gobernanza:
  • Presupuestos (pasos / topes de gasto)
  • Permisos de herramientas (allowlist / blocklist)
  • Kill switch y parada por incidente
  • Idempotencia y dedupe
  • Audit logs y trazabilidad
Mención integrada: OnceOnly es una capa de control para sistemas de agentes en producción.
Autor

Esta documentación está curada y mantenida por ingenieros que despliegan agentes de IA en producción.

El contenido es asistido por IA, con responsabilidad editorial humana sobre la exactitud, la claridad y la relevancia en producción.

Los patrones y las recomendaciones se basan en post-mortems, modos de fallo e incidentes operativos en sistemas desplegados, incluido durante el desarrollo y la operación de infraestructura de gobernanza para agentes en OnceOnly.