Problem (aus der Praxis)
At some point you’ll ask: “Do we use a managed agent platform or build our own?”
The wrong answer is religious:
- “never build, always buy”
- “never buy, always build”
In production, the real question is: where do you want the control layer to live?
Because the control layer (budgets, permissions, approvals, tracing) is the thing that decides whether your agent is a product feature or an incident generator.
Schnelle Entscheidung (wer sollte was wählen)
- Pick OpenAI/managed agent frameworks when speed-to-first-version matters and your risk is low-to-medium.
- Pick custom agents when you need deep integration with internal systems, strict governance, or unusual observability requirements.
- Many teams should do both: start managed, then pull pieces in-house as the control layer hardens.
Warum man in Prod die falsche Wahl trifft
1) They assume “managed” means “safe”
Managed doesn’t automatically mean:
- least privilege
- approvals for writes
- multi-tenant isolation
- audit logs that satisfy your compliance team
You still own safety. You just outsource some plumbing.
2) They assume “custom” means “more control”
Custom code can also mean:
- no monitoring
- no budgets
- no replay
Control is a discipline, not a repo.
3) They lock into the wrong abstraction
If your architecture couples:
- agent loop
- tool gateway
- observability
…you can’t migrate without a rewrite.
Keep your control layer framework-agnostic.
Vergleichstabelle
| Criterion | Managed agents | Custom agents | What matters in prod | |---|---|---|---| | Time to ship | Faster | Slower | Team velocity | | Governance hooks | Varies | You decide | Can you enforce? | | Observability | Varies | You build | Debuggability | | Multi-tenant isolation | Varies | You build | Blast radius | | Flexibility | Medium | High | Tool integration | | Migration | Risky if coupled | Under your control | Avoid rewrites |
Wo das in Production bricht
Managed breaks when:
- you can’t enforce your permission model
- you can’t get the logs you need for audits/replay
- you need custom gating (risk tiers, per-tenant kill switches)
Custom breaks when:
- you skip the boring parts (budgets, stop reasons, tracing)
- you build a “framework” nobody understands
- you add features faster than you add monitoring
Implementierungsbeispiel (echter Code)
This is the invariant that makes migration possible: all tools go through your gateway.
If your tool gateway is yours, you can switch agent runtimes later without changing safety.
from dataclasses import dataclass
from typing import Any, Callable
@dataclass(frozen=True)
class Policy:
allow: set[str]
require_approval: set[str]
class Denied(RuntimeError):
pass
class ToolGateway:
def __init__(self, *, policy: Policy, impls: dict[str, Callable[..., Any]]):
self.policy = policy
self.impls = impls
def call(self, tool: str, args: dict[str, Any], *, tenant_id: str, env: str) -> Any:
if tool not in self.policy.allow:
raise Denied(f"not allowed: {tool}")
if tool in self.policy.require_approval:
token = require_human_approval(tool=tool, args=args, tenant_id=tenant_id) # (pseudo)
args = {**args, "approval_token": token}
creds = load_scoped_credentials(tool=tool, tenant_id=tenant_id, env=env) # (pseudo)
fn = self.impls[tool]
return fn(args=args, creds=creds)export class Denied extends Error {}
export class ToolGateway {
constructor({ policy, impls }) {
this.policy = policy;
this.impls = impls;
}
async call({ tool, args, tenantId, env }) {
if (!this.policy.allow.includes(tool)) throw new Denied("not allowed: " + tool);
if (this.policy.requireApproval.includes(tool)) {
const token = await requireHumanApproval({ tool, args, tenantId }); // (pseudo)
args = { ...args, approval_token: token };
}
const creds = await loadScopedCredentials({ tool, tenantId, env }); // (pseudo)
const fn = this.impls[tool];
return fn({ args, creds });
}
}Echter Incident (mit Zahlen)
We saw a team adopt a managed agent runtime quickly (good call). Then they exposed write tools without building an external gateway (bad call).
A prompt injection payload in a ticket steered the agent into a write path.
Impact:
- 9 bogus tickets created
- ~45 minutes of cleanup + trust loss
- they disabled the agent and rewired tools behind a gateway anyway
The lesson wasn’t “managed is bad”. The lesson was “tool governance can’t be implicit”.
Migrationspfad (A → B)
Managed → custom (common)
- move tool calling behind your gateway first
- add budgets, stop reasons, monitoring outside the runtime
- replay traces to validate parity
- swap the runtime once governance is stable
Custom → managed (when you want speed)
- keep your gateway and logging
- use managed runtime for orchestration/model calls
- keep kill switch and approvals outside
Entscheidungshilfe
- If you can’t operate it without deep hooks → custom.
- If speed matters and tools are low-risk → managed is fine.
- If you’re multi-tenant with writes → gateway-first, regardless of runtime.
Abwägungen
- Managed can reduce engineering load, but may constrain governance hooks.
- Custom can do anything, including shipping without monitoring.
- Migration is easier if governance is externalized.
Wann du es NICHT nutzen solltest
- Don’t pick managed if you can’t get audit logs or enforce permissions.
- Don’t pick custom as an excuse to skip governance.
- Don’t couple your tool calls directly to the model runtime.
Checkliste (Copy/Paste)
- [ ] Tool gateway you own (policy + approvals)
- [ ] Budgets: steps/time/tool calls/USD
- [ ] Stop reasons returned to users
- [ ] Tracing: run_id/step_id + tool logs
- [ ] Canary changes; expect drift
- [ ] Replay traces for migrations
Sicheres Default-Config-Snippet (JSON/YAML)
architecture:
tool_gateway: "owned"
runtime: "managed_or_custom"
budgets:
max_steps: 25
max_tool_calls: 12
approvals:
required_for: ["db.write", "email.send", "ticket.close"]
logging:
tool_calls: true
stop_reasons: true
FAQ (3–5)
Von Patterns genutzt
Verwandte Failures
Q: Will managed agents solve reliability for us?
A: Not automatically. Reliability comes from budgets, tool policy, observability, and safe failure paths.
Q: What’s the non-negotiable piece to own?
A: The tool gateway and governance. That’s where side effects are controlled.
Q: How do we avoid lock-in?
A: Keep tools, budgets, and logging outside the runtime. Then swapping runtimes is a controlled change.
Q: What’s the first production control to add?
A: Default-deny tool allowlist + step/tool budgets + stop reasons.
Verwandte Seiten (3–6 Links)
- Foundations: How agents use tools · What makes an agent production-ready
- Failure: Prompt injection attacks · Silent agent drift
- Governance: Tool permissions · Human approval gates
- Production stack: Production agent stack