Problem (your agent worked… until you deployed it)
The notebook agent is fine.
The deployed agent is where the pain lives:
- it can’t reach the network you assumed it had
- it runs out of memory because someone enabled “full trace logging”
- it retries itself into a rate-limit storm
- it can’t read secrets because you baked them into the image (please don’t)
Containerizing isn’t “Dockerfile theatre”. It’s where you force the agent to behave like a real service.
Why this fails in production
Agents are awkward workloads:
- they’re bursty (traffic spikes = token spikes)
- they do I/O (tools) and hang on timeouts
- they have long tails (p95 is fine, p99 is chaos)
If your container doesn’t enforce budgets and timeouts at runtime, production will. It’ll just enforce them via 504s, OOMKills, and angry invoices.
Diagram: what you’re actually deploying
Real code: a container-friendly agent entrypoint (Python + JS)
We keep it boring:
- read config from env
- enforce budgets/timeouts
- expose a health endpoint
import os
import time
from dataclasses import dataclass
from typing import Any, Dict
@dataclass(frozen=True)
class Budgets:
max_steps: int
max_tool_calls: int
max_seconds: int
def load_budgets() -> Budgets:
return Budgets(
max_steps=int(os.getenv("AGENT_MAX_STEPS", "25")),
max_tool_calls=int(os.getenv("AGENT_MAX_TOOL_CALLS", "12")),
max_seconds=int(os.getenv("AGENT_MAX_SECONDS", "60")),
)
def run_request(task: str, *, budgets: Budgets) -> Dict[str, Any]:
t0 = time.time()
steps = 0
tool_calls = 0
while True:
steps += 1
if steps > budgets.max_steps:
return {"output": "", "stop_reason": "max_steps"}
if tool_calls > budgets.max_tool_calls:
return {"output": "", "stop_reason": "max_tool_calls"}
if time.time() - t0 > budgets.max_seconds:
return {"output": "", "stop_reason": "max_seconds"}
# ... agent loop ...
return {"output": "ok", "stop_reason": "finish"}
def health() -> Dict[str, str]:
return {"ok": "true"}export function loadBudgets() {
return {
maxSteps: Number(process.env.AGENT_MAX_STEPS ?? 25),
maxToolCalls: Number(process.env.AGENT_MAX_TOOL_CALLS ?? 12),
maxSeconds: Number(process.env.AGENT_MAX_SECONDS ?? 60),
};
}
export function runRequest(task, { budgets }) {
const t0 = Date.now();
let steps = 0;
let toolCalls = 0;
while (true) {
steps += 1;
if (steps > budgets.maxSteps) return { output: "", stop_reason: "max_steps" };
if (toolCalls > budgets.maxToolCalls) return { output: "", stop_reason: "max_tool_calls" };
if ((Date.now() - t0) / 1000 > budgets.maxSeconds) return { output: "", stop_reason: "max_seconds" };
// ... agent loop ...
return { output: "ok", stop_reason: "finish" };
}
}
export function health() {
return { ok: true };
}A sane Dockerfile (multi-stage, no secrets baked in)
FROM node:20-alpine AS deps
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci
FROM node:20-alpine AS runner
WORKDIR /app
ENV NODE_ENV=production
COPY --from=deps /app/node_modules ./node_modules
COPY . .
EXPOSE 3000
CMD ["npm","run","start"]
Key points:
- configs come from env (budgets, tool allowlists, model selection)
- secrets come from your platform (Vercel/K8s/Secrets Manager), not your image
- health check exists, so rollouts can be safe
Real failure (incident-style, with numbers)
We deployed an agent service with “debug logging” turned on by default. It logged full tool results for every call.
Impact in one afternoon:
- memory usage climbed until the container OOMKilled
- retries amplified load (clients retried, agent retried tools)
- ~12% request failure rate
- on-call: ~3 hours (because the logs were huge and still not useful)
Fix:
- default to sampled logging + redaction (
/observability-monitoring/agent-logging) - cap budgets at runtime (max seconds + max tool calls)
- add a kill switch config to disable expensive tools during incidents
Trade-offs
- Tight timeouts reduce tail latency and can reduce answer quality.
- More logging helps debugging and hurts cost/privacy. Default to less.
- “One container per agent” is simple and expensive. Shared services are cheaper and harder.
When NOT to containerize
If you’re not operating this as a service (no traffic, no SLOs), don’t overbuild. But once a real user can trigger it, you are operating a service. Congrats.
Copy-paste checklist
- [ ] Budgets loaded from env and enforced at runtime
- [ ] Tool gateway enforces timeouts/retries/allowlists
- [ ] Health endpoint + readiness checks
- [ ] Secrets injected by platform (not baked)
- [ ] Kill switch config (disable tools / disable writes)
- [ ] Logs are structured and sampled; PII redaction on by default
Safe default config snippet (YAML)
runtime:
env:
AGENT_MAX_STEPS: 25
AGENT_MAX_TOOL_CALLS: 12
AGENT_MAX_SECONDS: 60
tools:
allowlist: ["search.read", "http.get"]
timeouts_ms: { default: 8000 }
retries: { max: 2, backoff_ms: [200, 800] }
observability:
sampled_tool_results: true
result_sample_rate: 0.01
rollout:
canary_percent: 10
rollback_on_error_rate: 0.05
Implement in OnceOnly (optional)
# onceonly-python: tool allowlist + governed tool call
import os
from onceonly import OnceOnly
client = OnceOnly(
api_key=os.environ["ONCEONLY_API_KEY"],
timeout=5.0,
max_retries_429=2,
)
agent_id = "billing-agent"
client.gov.upsert_policy({
"agent_id": agent_id,
"allowed_tools": ["search.read", "http.get"],
"max_actions_per_hour": 200,
"max_spend_usd_per_day": 10.0,
})
res = client.ai.run_tool(
agent_id=agent_id,
tool="http.get",
args={"url": "https://example.com/health"},
spend_usd=0.001,
)
if not res.allowed:
raise RuntimeError(res.policy_reason)
FAQ (3–5)
Used by patterns
Related failures
Related pages (3–6 links)
- Foundations: What makes an agent production-ready
- Failures: Cascading tool failures · Partial outage handling
- Governance: Kill switch design · Budget controls
- Observability: AI agent logging
- Testing: Unit testing agents