Anti-Pattern No Stop Conditions: When Agents Never Stop

Without stop conditions agents may run indefinitely and waste tokens and resources.
On this page
  1. Idea In 30 Seconds
  2. Anti-Pattern Example
  3. Why It Happens And What Goes Wrong
  4. Correct Approach
  5. Quick Test
  6. How It Differs From Other Anti-Patterns
  7. No Monitoring vs No Stop Conditions
  8. Agents Without Guardrails vs No Stop Conditions
  9. Self-Check: Do You Have This Anti-Pattern?
  10. FAQ
  11. What Next

Idea In 30 Seconds

No Stop Conditions is an anti-pattern where an agent starts without clear completion conditions.

As a result, the agent can loop and spend budget without real progress. This increases latency, cost, and risk of side effects (state changes).

Simple rule: every agent run must have explicit stop conditions for successful completion and safe exit.


Anti-Pattern Example

The team builds a support agent that should find an answer in internal data and return a result to the user.

But the agent loop has no clear stop conditions.

PYTHON
state = init_state(user_message)

while True:
    decision = agent.next_step(state)
    result = run_tool(decision.tool, decision.args)
    state.append(result)
    # no has_final_answer(...)
    # no no_progress(...)
    # no repeated_action(...) check

In this setup, the agent may endlessly repeat similar steps:

PYTHON
search_docs -> fetch_page -> summarize -> search_docs -> ...

For this case, you need a controlled loop with explicit boundaries:

PYTHON
for step in range(MAX_STEPS):
    ...
    if has_final_answer(state):
        return build_answer(state)

In this case, missing stop conditions lead to:

  • runaway loop risk (infinite loop)
  • extra tool and LLM calls
  • uncontrolled time and budget consumption

Why It Happens And What Goes Wrong

This anti-pattern often appears when a team relies on the model and expects it to "figure out" when to stop.

Typical causes:

  • no explicit max_steps, timeout, or budget limit
  • no definition of what counts as a "ready answer"
  • no no_progress or repeated-action checks
  • stop control is left only at infrastructure level

As a result, teams get problems:

  • infinite or long loops - agent repeats similar steps without completion
  • higher latency - response arrives much later or never arrives
  • higher cost - number of LLM/tool steps grows for one request
  • side effects (state changes) - repeated actions may create duplicate records, re-update status, duplicate API calls, or re-send external actions
  • unstable results - same request completes differently across runs

Typical production signals that stop conditions are missing or weak:

  • noticeable share of runs end with infrastructure timeout, not controlled stop
  • P95 by step count keeps growing
  • traces show repeated identical calls with minimal argument changes
  • cost per request grows faster than success rate

It is important that each next step is part of LLM inference. If there are no clear completion conditions in the loop, the model keeps choosing "one more step" even when there is almost no new useful information.

When this setup expands, without trace and execution visualization it becomes hard to explain why a run did not stop in time. That is why production systems usually have a dedicated observability layer for agent runs.

Correct Approach

Start with the simplest controlled loop that reliably handles most requests today. Add new stop conditions only when there is a measurable failure, risk, or limitation in the current design.

Practical framework:

  • set a positive completion condition (final_answer_ready)
  • set guard boundaries (max_steps, timeout, budget)
  • add no_progress and repeated-action checks
  • record stop reason for each run and track metrics (for example, improved success rate without sharp growth in latency and cost per request)

In practice, no_progress often means repeated identical tool calls, minimal state changes, or no new useful information after the next step.

PYTHON
MAX_STEPS = 8

def run_agent(user_message: str):
    state = init_state(user_message)

    for step in range(MAX_STEPS):  # hard limit for runaway loops
        if timed_out():
            return stop("timeout")
        if budget_exceeded():
            return stop("budget_exceeded")

        decision = agent.next_step(state)

        if decision.type == "final_answer":
            if validate_output(decision.output):  # format, required fields, no empty answer
                return decision.output
            return stop("invalid_output")

        result = run_tool(decision.tool, decision.args)
        if no_progress(state, result):  # same tool/result pattern or no meaningful state change
            return stop("no_progress")

        state.append(result)

    return stop("max_steps_exceeded")

In this setup, the loop becomes controlled: the system either returns a valid answer or stops with a transparent reason.

Quick Test

If these questions are answered with "yes", you have no-stop-conditions risk:

  • Do some runs end with timeout instead of a controlled stop_reason?
  • Does one request sometimes do disproportionately many steps with no visible progress?
  • Do traces show repeated similar actions without new outcomes?

How It Differs From Other Anti-Patterns

No Monitoring vs No Stop Conditions

No MonitoringNo Stop Conditions
Main problem: system lacks enough observability to see what happens during a run.Main problem: agent loop has no clear completion conditions.
When it appears: when run-level logs, traces, metrics, and stop_reason are missing.When it appears: when a run proceeds without max_steps, timeout, budget limit, or no_progress checks.

Agents Without Guardrails vs No Stop Conditions

Agents Without GuardrailsNo Stop Conditions
Main problem: agent runs without policy boundaries and system constraints.Main problem: agent can run infinite or too-long loops.
When it appears: when there is no allowlist, deny-by-default, budget, or safety constraints.When it appears: when loop logic has no explicit completion criterion and controlled stop_reason.

Self-Check: Do You Have This Anti-Pattern?

Quick check for the anti-pattern No Stop Conditions.
Mark items for your system and check status below.

Check your system:

Progress: 0/8

⚠ There are signs of this anti-pattern

Move simple steps into a workflow and keep the agent only for complex decisions.

FAQ

Q: If we have max_steps, is that already enough?
A: No. max_steps is required, but by itself does not cover all risks. You also need timeout, budget limit, progress checks, and a valid ready-answer criterion.

Q: When should we add a new stop condition?
A: When there is a concrete signal: incidents, repeated loops, or growth in cost or latency that current rules cannot address without disproportionate system complexity.

Q: How to start if stop conditions are almost absent now?
A: Start with the minimum: max_steps, timeout, budget, and stop_reason in logs. Then add no_progress and final-answer validation.


What Next

Related anti-patterns:

What to build instead:

⏱️ 7 min read β€’ Updated March 16, 2026Difficulty: β˜…β˜…β˜…
Implement in OnceOnly
Safe defaults for tool permissions + write gating.
Use in OnceOnly
# onceonly guardrails (concept)
version: 1
tools:
  default_mode: read_only
  allowlist:
    - search.read
    - kb.read
    - http.get
writes:
  enabled: false
  require_approval: true
  idempotency: true
controls:
  kill_switch: { enabled: true, mode: disable_writes }
audit:
  enabled: true
Integrated: production controlOnceOnly
Add guardrails to tool-calling agents
Ship this pattern with governance:
  • Budgets (steps / spend caps)
  • Tool permissions (allowlist / blocklist)
  • Kill switch & incident stop
  • Idempotency & dedupe
  • Audit logs & traceability
Integrated mention: OnceOnly is a control layer for production agent systems.
Author

This documentation is curated and maintained by engineers who ship AI agents in production.

The content is AI-assisted, with human editorial responsibility for accuracy, clarity, and production relevance.

Patterns and recommendations are grounded in post-mortems, failure modes, and operational incidents in deployed systems, including during the development and operation of governance infrastructure for agents at OnceOnly.