PydanticAI vs LangChain Agents (comparatif production) + code

  • Choisis sans te faire piéger par la démo.
  • Vois ce qui casse en prod (ops, coût, drift).
  • Obtiens un chemin de migration + une checklist.
  • Pars avec des defaults : budgets, validation, stop reasons.
PydanticAI pousse vers des I/O typées et des invariants. LangChain Agents pousse vers l’orchestration flexible. En prod : validation, replay, et contrôle des outils.
Sur cette page
  1. Le problème (côté prod)
  2. Décision rapide (qui choisit quoi)
  3. Pourquoi on choisit mal en prod
  4. 1) They think a framework replaces governance
  5. 2) They treat structured outputs as “nice to have”
  6. 3) They over-index on integration count
  7. Tableau comparatif
  8. Où ça casse en prod
  9. Typed-first breaks
  10. Flexible breaks
  11. Exemple d’implémentation (code réel)
  12. Incident réel (avec chiffres)
  13. Chemin de migration (A → B)
  14. Flexible → typed-first
  15. Typed-first → flexible (when you need it)
  16. Guide de décision
  17. Compromis
  18. Quand NE PAS l’utiliser
  19. Checklist (copier-coller)
  20. Config par défaut sûre (JSON/YAML)
  21. FAQ (3–5)
  22. Pages liées (3–6 liens)

Le problème (côté prod)

At some point you’ll hit the same production problem: the model output shape matters more than the model prose.

If you’re calling tools, parsing JSON, and triggering side effects, you need:

  • schema validation
  • invariants
  • fail-closed behavior

That’s where “typed agent frameworks” are attractive. And where “flexible agent frameworks” can either help or hurt, depending on how much discipline your team has.

Décision rapide (qui choisit quoi)

  • Pick PydanticAI-style typed outputs if your system is tool-heavy and you want validation to be the default, not an afterthought.
  • Pick LangChain agents if you need flexibility across integrations and you’re willing to enforce schemas and governance yourself.
  • If you don’t validate outputs, it doesn’t matter which you pick — you’ll ship silent failures.

Pourquoi on choisit mal en prod

1) They think a framework replaces governance

No framework replaces:

  • budgets
  • tool permissions
  • monitoring
  • approvals for writes

2) They treat structured outputs as “nice to have”

In prod, structured outputs are how you prevent:

  • tool response corruption turning into actions
  • prompt injection steering tool calls
  • “close enough JSON” becoming “close enough incident”

3) They over-index on integration count

“It integrates with everything” isn’t a production plan. If your tool gateway is unsafe, more integrations just means more blast radius.

Tableau comparatif

| Criterion | PydanticAI (typed-first) | LangChain agents (flexible) | What matters in prod | |---|---|---|---| | Default output validation | Strong | Depends on you | Fail closed | | Integration surface | Smaller | Larger | Blast radius | | Debuggability | Better if typed | Better if instrumented | Traces | | Failure handling | Explicit if enforced | Emergent if loose | Stop reasons | | Best for | Tool-heavy systems | Rapid integration | Team discipline |

Où ça casse en prod

Typed-first breaks

  • you still have to maintain schemas
  • you can over-constrain and reject useful outputs
  • teams misuse typing as “security” (it isn’t)

Flexible breaks

  • silent parse errors
  • “best effort” JSON coercion
  • tool outputs treated as instructions
  • drift changes output shapes without tests

Exemple d’implémentation (code réel)

No matter what framework you use, put a strict validator between the model and side effects.

This shows a minimal typed decision object with fail-closed parsing.

PYTHON
from dataclasses import dataclass
from typing import Any


@dataclass(frozen=True)
class Decision:
  kind: str  # "final" | "tool"
  tool: str | None
  args: dict[str, Any] | None
  answer: str | None


class InvalidDecision(RuntimeError):
  pass


def validate_decision(obj: Any) -> Decision:
  if not isinstance(obj, dict):
      raise InvalidDecision("expected object")
  kind = obj.get("kind")
  if kind not in {"final", "tool"}:
      raise InvalidDecision("invalid kind")
  if kind == "final":
      ans = obj.get("answer")
      if not isinstance(ans, str) or not ans.strip():
          raise InvalidDecision("missing answer")
      return Decision(kind="final", tool=None, args=None, answer=ans)
  tool = obj.get("tool")
  args = obj.get("args")
  if not isinstance(tool, str):
      raise InvalidDecision("missing tool")
  if not isinstance(args, dict):
      raise InvalidDecision("missing args")
  return Decision(kind="tool", tool=tool, args=args, answer=None)
JAVASCRIPT
export class InvalidDecision extends Error {}

export function validateDecision(obj) {
if (!obj || typeof obj !== "object") throw new InvalidDecision("expected object");
const kind = obj.kind;
if (kind !== "final" && kind !== "tool") throw new InvalidDecision("invalid kind");

if (kind === "final") {
  if (typeof obj.answer !== "string" || !obj.answer.trim()) throw new InvalidDecision("missing answer");
  return { kind: "final", answer: obj.answer };
}

if (typeof obj.tool !== "string") throw new InvalidDecision("missing tool");
if (!obj.args || typeof obj.args !== "object") throw new InvalidDecision("missing args");
return { kind: "tool", tool: obj.tool, args: obj.args };
}

Incident réel (avec chiffres)

We saw a team ship a flexible agent that parsed “tool calls” with best-effort JSON extraction.

During a partial outage, tool output included an HTML error page. The model copied part of it into the “args”. The parser coerced it into a dict.

Impact:

  • 17 runs wrote garbage data into a queue
  • downstream workers crashed for ~25 minutes
  • on-call spent ~2 hours tracing the root cause because logs only had the final answer

Fix:

  1. strict parsing + schema validation for decisions and tool outputs
  2. fail closed before any write
  3. monitoring for invalid_decision_rate

Typed outputs didn’t solve this alone — strict validation did.

Chemin de migration (A → B)

Flexible → typed-first

  1. add schema validation at the boundary (model output + tool output)
  2. define a small decision schema (tool vs final)
  3. gradually type the high-risk parts (writes) first

Typed-first → flexible (when you need it)

  1. keep typed boundaries for actions and tools
  2. allow free-form text only inside “analysis” fields that never trigger side effects

Guide de décision

  • If your system does writes → prioritize typed/validated boundaries.
  • If you’re doing experiments → flexibility is fine, but keep budgets and logging.
  • If you’re multi-tenant → strict validation is non-negotiable.

Compromis

  • Validation rejects some outputs. That’s good. It forces you to handle the failure path.
  • Typing adds maintenance overhead.
  • Flexibility can ship faster, but it ships more production surprises too.

Quand NE PAS l’utiliser

  • Don’t rely on typing as security. You still need permissions and approvals.
  • Don’t use best-effort parsing for tool calls that trigger writes.
  • Don’t skip monitoring. Validation failures are a metric, not a shame.

Checklist (copier-coller)

  • [ ] Validate model decisions (schema) before acting
  • [ ] Validate tool outputs (schema + invariants)
  • [ ] Fail closed for writes
  • [ ] Budgets + stop reasons
  • [ ] Audit logs for tool calls
  • [ ] Canary changes; drift is real

Config par défaut sûre (JSON/YAML)

YAML
validation:
  model_decision:
    fail_closed: true
    schema: "Decision(kind, tool?, args?, answer?)"
  tool_output:
    fail_closed: true
    max_chars: 200000
budgets:
  max_steps: 25
  max_tool_calls: 12
monitoring:
  track: ["invalid_decision_rate", "tool_output_invalid_rate", "stop_reason"]

FAQ (3–5)

Does typing guarantee correctness?
No. It guarantees shape. You still need invariants, permissions, and safe-mode behavior.
Is LangChain ‘unsafe’?
No. It’s flexible. Safety comes from how you enforce boundaries: budgets, validation, and a tool gateway.
What should we type first?
Anything that triggers writes or money: tool calls, approvals, budget policy outputs.
Can strict validation hurt completion rate?
Yes. That’s usually the point: stop guessing and handle failure paths explicitly.

Q: Does typing guarantee correctness?
A: No. It guarantees shape. You still need invariants, permissions, and safe-mode behavior.

Q: Is LangChain ‘unsafe’?
A: No. It’s flexible. Safety comes from how you enforce boundaries: budgets, validation, and a tool gateway.

Q: What should we type first?
A: Anything that triggers writes or money: tool calls, approvals, budget policy outputs.

Q: Can strict validation hurt completion rate?
A: Yes. That’s usually the point: stop guessing and handle failure paths explicitly.

Pages liées (3–6 liens)

Pas sur que ce soit votre cas ?

Concevez votre agent ->
⏱️ 7 min de lectureMis à jour Mars, 2026Difficulté: ★★☆
Intégré : contrôle en productionOnceOnly
Ajoutez des garde-fous aux agents tool-calling
Livrez ce pattern avec de la gouvernance :
  • Budgets (steps / plafonds de coût)
  • Permissions outils (allowlist / blocklist)
  • Kill switch & arrêt incident
  • Idempotence & déduplication
  • Audit logs & traçabilité
Mention intégrée : OnceOnly est une couche de contrôle pour des systèmes d’agents en prod.
Auteur

Cette documentation est organisée et maintenue par des ingénieurs qui déploient des agents IA en production.

Le contenu est assisté par l’IA, avec une responsabilité éditoriale humaine quant à l’exactitude, la clarté et la pertinence en production.

Les patterns et recommandations s’appuient sur des post-mortems, des modes de défaillance et des incidents opérationnels dans des systèmes déployés, notamment lors du développement et de l’exploitation d’une infrastructure de gouvernance pour les agents chez OnceOnly.