OpenAI Agents vs Custom Agents (Production Comparison) + Code

  • Pick the right tool without demo-driven regret.
  • See what breaks in production (operability, cost, drift).
  • Get a migration path and decision checklist.
  • Leave with defaults: budgets, validation, stop reasons.
Managed agent frameworks get you moving fast. Custom agents get you control. A production comparison: governance, observability, failure handling, and a sane migration path.
On this page
  1. Problem-first intro
  2. Quick decision (who should pick what)
  3. Why people pick the wrong option in production
  4. 1) They assume “managed” means “safe”
  5. 2) They assume “custom” means “more control”
  6. 3) They lock into the wrong abstraction
  7. Comparison table
  8. Where this breaks in production
  9. Managed breaks when:
  10. Custom breaks when:
  11. Implementation example (real code)
  12. Real failure case (incident-style, with numbers)
  13. Migration path (A → B)
  14. Managed → custom (common)
  15. Custom → managed (when you want speed)
  16. Decision guide
  17. Trade-offs
  18. When NOT to use
  19. Copy-paste checklist
  20. Safe default config snippet (JSON/YAML)
  21. FAQ (3–5)
  22. Related pages (3–6 links)

Problem-first intro

At some point you’ll ask: “Do we use a managed agent platform or build our own?”

The wrong answer is religious:

  • “never build, always buy”
  • “never buy, always build”

In production, the real question is: where do you want the control layer to live?

Because the control layer (budgets, permissions, approvals, tracing) is the thing that decides whether your agent is a product feature or an incident generator.

Quick decision (who should pick what)

  • Pick OpenAI/managed agent frameworks when speed-to-first-version matters and your risk is low-to-medium.
  • Pick custom agents when you need deep integration with internal systems, strict governance, or unusual observability requirements.
  • Many teams should do both: start managed, then pull pieces in-house as the control layer hardens.

Why people pick the wrong option in production

1) They assume “managed” means “safe”

Managed doesn’t automatically mean:

  • least privilege
  • approvals for writes
  • multi-tenant isolation
  • audit logs that satisfy your compliance team

You still own safety. You just outsource some plumbing.

2) They assume “custom” means “more control”

Custom code can also mean:

  • no monitoring
  • no budgets
  • no replay

Control is a discipline, not a repo.

3) They lock into the wrong abstraction

If your architecture couples:

  • agent loop
  • tool gateway
  • observability

…you can’t migrate without a rewrite.

Keep your control layer framework-agnostic.

Comparison table

| Criterion | Managed agents | Custom agents | What matters in prod | |---|---|---|---| | Time to ship | Faster | Slower | Team velocity | | Governance hooks | Varies | You decide | Can you enforce? | | Observability | Varies | You build | Debuggability | | Multi-tenant isolation | Varies | You build | Blast radius | | Flexibility | Medium | High | Tool integration | | Migration | Risky if coupled | Under your control | Avoid rewrites |

Where this breaks in production

Managed breaks when:

  • you can’t enforce your permission model
  • you can’t get the logs you need for audits/replay
  • you need custom gating (risk tiers, per-tenant kill switches)

Custom breaks when:

  • you skip the boring parts (budgets, stop reasons, tracing)
  • you build a “framework” nobody understands
  • you add features faster than you add monitoring

Implementation example (real code)

This is the invariant that makes migration possible: all tools go through your gateway.

If your tool gateway is yours, you can switch agent runtimes later without changing safety.

PYTHON
from dataclasses import dataclass
from typing import Any, Callable


@dataclass(frozen=True)
class Policy:
  allow: set[str]
  require_approval: set[str]


class Denied(RuntimeError):
  pass


class ToolGateway:
  def __init__(self, *, policy: Policy, impls: dict[str, Callable[..., Any]]):
      self.policy = policy
      self.impls = impls

  def call(self, tool: str, args: dict[str, Any], *, tenant_id: str, env: str) -> Any:
      if tool not in self.policy.allow:
          raise Denied(f"not allowed: {tool}")
      if tool in self.policy.require_approval:
          token = require_human_approval(tool=tool, args=args, tenant_id=tenant_id)  # (pseudo)
          args = {**args, "approval_token": token}

      creds = load_scoped_credentials(tool=tool, tenant_id=tenant_id, env=env)  # (pseudo)
      fn = self.impls[tool]
      return fn(args=args, creds=creds)
JAVASCRIPT
export class Denied extends Error {}

export class ToolGateway {
constructor({ policy, impls }) {
  this.policy = policy;
  this.impls = impls;
}

async call({ tool, args, tenantId, env }) {
  if (!this.policy.allow.includes(tool)) throw new Denied("not allowed: " + tool);
  if (this.policy.requireApproval.includes(tool)) {
    const token = await requireHumanApproval({ tool, args, tenantId }); // (pseudo)
    args = { ...args, approval_token: token };
  }
  const creds = await loadScopedCredentials({ tool, tenantId, env }); // (pseudo)
  const fn = this.impls[tool];
  return fn({ args, creds });
}
}

Real failure case (incident-style, with numbers)

We saw a team adopt a managed agent runtime quickly (good call). Then they exposed write tools without building an external gateway (bad call).

A prompt injection payload in a ticket steered the agent into a write path.

Impact:

  • 9 bogus tickets created
  • ~45 minutes of cleanup + trust loss
  • they disabled the agent and rewired tools behind a gateway anyway

The lesson wasn’t “managed is bad”. The lesson was “tool governance can’t be implicit”.

Migration path (A → B)

Managed → custom (common)

  1. move tool calling behind your gateway first
  2. add budgets, stop reasons, monitoring outside the runtime
  3. replay traces to validate parity
  4. swap the runtime once governance is stable

Custom → managed (when you want speed)

  1. keep your gateway and logging
  2. use managed runtime for orchestration/model calls
  3. keep kill switch and approvals outside

Decision guide

  • If you can’t operate it without deep hooks → custom.
  • If speed matters and tools are low-risk → managed is fine.
  • If you’re multi-tenant with writes → gateway-first, regardless of runtime.

Trade-offs

  • Managed can reduce engineering load, but may constrain governance hooks.
  • Custom can do anything, including shipping without monitoring.
  • Migration is easier if governance is externalized.

When NOT to use

  • Don’t pick managed if you can’t get audit logs or enforce permissions.
  • Don’t pick custom as an excuse to skip governance.
  • Don’t couple your tool calls directly to the model runtime.

Copy-paste checklist

  • [ ] Tool gateway you own (policy + approvals)
  • [ ] Budgets: steps/time/tool calls/USD
  • [ ] Stop reasons returned to users
  • [ ] Tracing: run_id/step_id + tool logs
  • [ ] Canary changes; expect drift
  • [ ] Replay traces for migrations

Safe default config snippet (JSON/YAML)

YAML
architecture:
  tool_gateway: "owned"
  runtime: "managed_or_custom"
budgets:
  max_steps: 25
  max_tool_calls: 12
approvals:
  required_for: ["db.write", "email.send", "ticket.close"]
logging:
  tool_calls: true
  stop_reasons: true

FAQ (3–5)

Will managed agents solve reliability for us?
Not automatically. Reliability comes from budgets, tool policy, observability, and safe failure paths.
What’s the non-negotiable piece to own?
The tool gateway and governance. That’s where side effects are controlled.
How do we avoid lock-in?
Keep tools, budgets, and logging outside the runtime. Then swapping runtimes is a controlled change.
What’s the first production control to add?
Default-deny tool allowlist + step/tool budgets + stop reasons.

Q: Will managed agents solve reliability for us?
A: Not automatically. Reliability comes from budgets, tool policy, observability, and safe failure paths.

Q: What’s the non-negotiable piece to own?
A: The tool gateway and governance. That’s where side effects are controlled.

Q: How do we avoid lock-in?
A: Keep tools, budgets, and logging outside the runtime. Then swapping runtimes is a controlled change.

Q: What’s the first production control to add?
A: Default-deny tool allowlist + step/tool budgets + stop reasons.

Not sure this is your use case?

Design your agent ->
⏱️ 6 min readUpdated Mar, 2026Difficulty: ★★☆
Integrated: production controlOnceOnly
Add guardrails to tool-calling agents
Ship this pattern with governance:
  • Budgets (steps / spend caps)
  • Tool permissions (allowlist / blocklist)
  • Kill switch & incident stop
  • Idempotency & dedupe
  • Audit logs & traceability
Integrated mention: OnceOnly is a control layer for production agent systems.
Author

This documentation is curated and maintained by engineers who ship AI agents in production.

The content is AI-assisted, with human editorial responsibility for accuracy, clarity, and production relevance.

Patterns and recommendations are grounded in post-mortems, failure modes, and operational incidents in deployed systems, including during the development and operation of governance infrastructure for agents at OnceOnly.