Self-Critique Agent Pattern: draft → critique → revise

Run a safe self-critique loop: one schema-based review, one constrained revision, and a change log for stable quality.
On this page
  1. Pattern Essence
  2. Problem
  3. Solution
  4. How It Works
  5. In Code, It Looks Like This
  6. How It Looks at Runtime
  7. When It Fits - And When It Doesn't
  8. Good Fit
  9. Not a Fit
  10. How It Differs From Reflection
  11. When To Use Self-Critique (vs Other Patterns)
  12. How To Combine With Other Patterns
  13. In Short
  14. Pros and Cons
  15. FAQ
  16. What Next

Pattern Essence

Self-Critique Agent is a pattern where the agent first writes a draft, then records risks and required changes using a fixed template, performs one constrained revision, and logs exactly what changed.

When to use it: when you need strict, structured risk review and controlled revision with an audit trail.


Compared to lightweight reflection, self-critique is usually stricter:

  • critique has a fixed format (for example, JSON)
  • edits are allowed only by rules
  • text growth is controlled
  • the system stores a change log (before → after)

Self-Critique Agent Pattern: draft → critique → revise

Problem

Imagine an agent preparing a customer-facing issue update.

Draft:

"The release is delayed because of a failure in the payment module."

Then a request arrives:

"Make the wording softer."

Without a self-critique frame, the agent often does a "pretty rewrite" that goes beyond the task:

  • adds new assumptions
  • makes the tone less specific
  • expands the text beyond the required scope
  • hides the issue instead of clarifying it

"Improve wording" must not become "change meaning".

As a result:

  • meaning shifts silently
  • it is unclear what was actually fixed
  • audit cannot reconstruct revision logic
  • the response becomes less reliable for decision-making

That is the core problem: during "improvement," the agent can change meaning even though it should change only form.

Solution

Self-Critique introduces a rule: edit only what is listed in "required changes".

Analogy: it is like editorial revision from a list of comments. First we lock what exactly must be fixed, then we apply one controlled revision. This makes the response clearer without changing meaning.

Core principle: structured critique first, then one controlled revision with an audit trail.

The agent may propose rewrites, but the system allows only point edits that are explicitly listed in "required changes".

Controlled flow:

  1. Draft: generate the first version
  2. Critique: produce an artifact (risks + required_changes)
  3. Decision: ok/revise/escalate
  4. Revision: perform one constrained revision
  5. Audit: log diff + metadata

This gives you:

  • clearer output without changing facts
  • clear split between "what is wrong" and "what was fixed"
  • reproducible revisions
  • control over high-risk outputs before publishing

Works well if:

  • critique uses a fixed structure (schema-driven)
  • revision is limited to required_changes
  • no_new_facts is enforced
  • audit diff is mandatory

How It Works

Diagram

For this pattern to be safe, you need strict boundaries:

  • one critique pass
  • one revision pass
  • do not add new facts
  • edit only items from required changes
  • do not bloat the text (for example, +20% max)
  • for high risk: stop or request human review
Full flow description: Draft → Critique → Revise → Audit

Draft
The agent generates the initial response.

Critique
A dedicated critique step returns structured output: risks, required changes, severity.

Revision
The agent changes only what is marked required, without expanding scope.

Audit
The system logs before/after, changed flag, and a short diff for debugging and incident analysis.

In Code, It Looks Like This

PYTHON
draft = writer.generate(goal, context)

critique = critic.review_once(
    draft=draft,
    schema="risks_required_changes_v1",
)

if critique.high_risk:
    return escalate_to_human(critique.reason)

if critique.ok:
    return draft

revised = writer.revise_once(
    draft=draft,
    required_changes=critique.required_changes,
    rules=[
        "no_new_facts",
        "max_length_increase_pct=20",
        "keep_scope",
    ],
)

approved = supervisor.review_output_patch(
    original=draft,
    revised=revised,
    allowed_changes=critique.required_changes,
)

audit.log_diff(
    before=draft,
    after=approved,
    risks=critique.risks,
)

return approved

Self-Critique should not run "until perfect." One critique + one revision (revise), and a check that revision stayed within required_changes.

How It Looks at Runtime

TEXT
Goal: propose a safe action plan during a network incident

Draft:
"The issue was caused by the network. We should restart the entire cluster."

Critique:
- risk: claim about root cause without evidence
- risk: action has too wide blast radius (state changes)
- required_change: add checks before restart

Revision:
"A likely cause is a network failure.
Before restarting the cluster, verify node health and latency.
If a partial outage is confirmed, restart only affected nodes."

Full Self-Critique agent example

PYPython
TSTypeScript · soon

When It Fits - And When It Doesn't

Good Fit

SituationWhy Self-Critique Fits
High-risk output before sendingSelf-critique adds structured control before final release of the response.
You need an audit trail of editsCritique and edits are explicit and suitable for audit.
You have a strict critique schemaA formal structure makes review reproducible and controllable.
You need controlled rewriting without output bloatSelf-critique keeps rewriting limited to necessary changes only.

Not a Fit

SituationWhy Self-Critique Does Not Fit
Latency is critical and there is no budget for an extra passA second generation step can be too expensive in both time and cost.
You cannot enforce hard rules such as no_new_factsWithout strict constraints, critique/rewriting can reduce reliability.
The task is deterministic and consistently validated by testsAn extra critique pass duplicates an already reliable validation process.

Because self-critique adds a second generation step and increases run cost.

How It Differs From Reflection

ReflectionSelf-Critique
Review depthQuick "is this okay" check (light review)Strictly records "what is wrong" and "what must be fixed"
Formatok/issues/fixrisks, severity, required changes
RevisionUsually minimalConstrained rewrite under explicit rules
Operational focusFast quality passControlled rewriting + change log

Reflection is more often used as a lightweight filter before sending. Self-Critique is for stricter control over changes.

When To Use Self-Critique (vs Other Patterns)

Use Self-Critique when you need deep critique against explicit criteria and controlled rewriting.

Quick test:

  • if you need to "evaluate by checklist and rewrite the answer" -> Self-Critique
  • if you need to "do only a short pre-send check" -> Reflection Agent
Comparison with other patterns and examples

Quick cheat sheet:

If the task looks like this...Use
You need a quick check before the final responseReflection Agent
You need deep criteria-based critique and answer rewritingSelf-Critique Agent
You need to recover after timeout, exception, or tool failureFallback-Recovery Agent
You need strict policy checks before risky actionsGuarded-Policy Agent

Examples:

Reflection: "Before the final response, quickly check logic, completeness, and obvious mistakes."

Self-Critique: "Evaluate the answer by checklist (accuracy, completeness, risks), then rewrite."

Fallback-Recovery: "If the API does not respond, do retry -> fallback source -> escalate."

Guarded-Policy: "Before sending data externally, run policy checks to confirm it is allowed."

How To Combine With Other Patterns

  • Self-Critique + RAG: critique and revisions are allowed only within facts from retrieval context.
  • Self-Critique + Supervisor: risky edits are not auto-applied and are routed for human approval.
  • Self-Critique + Reflection: Reflection does a quick pre-check, and Self-Critique turns on for complex or disputed answers.

In Short

Quick take

Self-Critique Agent:

  • Creates a clear list of issues and required changes
  • Applies one controlled revision
  • Stores a change log (before → after)
  • Prevents "text improvement" from silently changing meaning

Pros and Cons

Pros

reviews the answer before final output

reduces obvious mistakes

improves clarity of wording

helps enforce task requirements

Cons

adds an extra step and latency

does not replace fact verification

without clear criteria, it can over-edit

FAQ

Q: Can we run multiple critique passes?
A: Technically yes, but it quickly turns into an expensive loop. Safe default baseline: 1 + 1.

Q: Why is change logging mandatory?
A: Without a diff, it is hard to understand what changed quality and where errors appeared.

Q: Do we need a separate critic model?
A: Sometimes, but not required. More important is strict schema, output validation, and hard revision rules.

What Next

Self-Critique improves quality through controlled rewriting.

But what should you do when the answer must not only be rewritten, but checked against policy before a risky action?

⏱️ 10 min readUpdated Mar, 2026Difficulty: ★★★
Practical continuation

Pattern implementation examples

Continue with implementation using example projects.

Integrated: production controlOnceOnly
Add guardrails to tool-calling agents
Ship this pattern with governance:
  • Budgets (steps / spend caps)
  • Tool permissions (allowlist / blocklist)
  • Kill switch & incident stop
  • Idempotency & dedupe
  • Audit logs & traceability
Integrated mention: OnceOnly is a control layer for production agent systems.
Author

This documentation is curated and maintained by engineers who ship AI agents in production.

The content is AI-assisted, with human editorial responsibility for accuracy, clarity, and production relevance.

Patterns and recommendations are grounded in post-mortems, failure modes, and operational incidents in deployed systems, including during the development and operation of governance infrastructure for agents at OnceOnly.