How to Structure an Effective Self-Critique for LLM Outputs
Good automated critique processes turn raw model output into reliable, polished content. This guide gives concrete techniques to set objectives, craft prompts, build rubrics, and run iterative review loops so models self-correct systematically.
- Define clear objectives before asking for critique.
- Create focused prompts and explicit rubrics for measurable feedback.
- Use step-by-step reasoning and revision loops to reduce errors and hallucinations.
- Mitigate bias and gaming with adversarial tests and cross-checks.
- Follow the implementation checklist to deploy repeatable reviews.
Define critique objectives
Start by specifying what “good” means for the task. Objectives guide the review scope, granularity, and acceptance thresholds.
- Output quality: factual accuracy, relevance, completeness.
- Style and tone: voice, readability, target-audience fit.
- Safety and compliance: harmful content, privacy, regulated claims.
- Efficiency: response length, latency, and resource constraints.
Turn each objective into an actionable question. Example: “Is every factual claim supported by a verifiable source or stated as uncertain?”
Quick answer (one paragraph)
To build an effective self-critique, define clear objectives, craft focused self-review prompts, create measurable rubrics, require explicit step-by-step reasoning, and run iterative review–revise loops while adding adversarial checks to catch bias, hallucination, and gaming.
Craft self-review prompts
Design prompts that constrain scope, request evidence, and ask for specific changes. Vague prompts yield vague feedback.
- Be explicit about perspective: e.g., “Act as a technical editor for a senior developer audience.”
- Request concrete checks: “List unsupported claims, propose citations, and mark sections to shorten by 20%.”
- Ask for change suggestions, not just assessments: “Rewrite the introduction to match tone X.”
Example prompt template:
Review the output for factual errors, unsupported claims, and tone mismatch with a senior-developer audience. For each issue, (1) identify the problem, (2) explain why it's a problem, and (3) propose a text edit or citation.Set evaluation criteria and rubrics
Rubrics turn subjective judgment into measurable signals. Use 3–5 dimensions and 3–4 levels per dimension.
| Dimension | Excellent (3) | Acceptable (2) | Poor (1) |
|---|---|---|---|
| Factual accuracy | No factual errors; claims verifiable | Minor inaccuracies; most claims supportable | Multiple factual errors or unsupported claims |
| Clarity | Concise, unambiguous, logical flow | Some unclear phrasing; overall understandable | Confusing or disorganized |
| Tone | Matches audience and brand precisely | Minor mismatches but acceptable | Inappropriate tone for audience |
Assign numeric scores and thresholds that map to accept/reject/needs-revision outcomes. Automate aggregation where possible.
Elicit step-by-step reasoning
Require the model to show its reasoning chain for claims and edits. Explicit reasoning makes errors easier to spot and discourages surface-level fixes.
- Ask for numbered rationales explaining why each claim is true or false.
- Request evidence links or a short verification plan per claim.
- Use “think-aloud” style within safe boundaries: concise, numbered steps.
For each factual statement, provide: (1) a one-sentence rationale, (2) a source or search term to verify, (3) confidence level (low/med/high).Example output fragment:
1. Claim: "X reduces Y by 30%." Rationale: Based on Study Z methodology A showing mean difference. Verify: search "Study Z Y reduction 30%". Confidence: medium.Design iterative review and revision loops
Set a loop structure: initial output → critique → targeted revision → re-evaluation. Limit loop count and require evidence of improvement.
- Stage 1: Quick pass for safety and major factual errors.
- Stage 2: Content polish (clarity, examples, citations).
- Stage 3: Style and length tuning to meet constraints.
Define exit criteria for each stage (e.g., rubric scores or hard constraints). Keep iterations short and focused to avoid drift.
Mitigate bias, hallucination, and gaming
Combine preventive design with active detection.
- Bias: include demographic and cultural checks in the rubric; require alternate-perspective statements.
- Hallucination: demand verifyable citations or explicit uncertainty flags for unverifiable claims.
- Gaming: randomize prompt phrasing and include adversarial tests that try to lure the model into common failure modes.
Additional tactics:
- Cross-check outputs using a different model or retrieval-augmented verification.
- Use prompts that require listing how the model could be wrong.
- Log model critiques and use human spot-checks on sampled iterations to detect systematic gaming.
Common pitfalls and how to avoid them
- Vague objectives → remedy: define measurable success criteria and acceptance thresholds.
- Too-broad review prompts → remedy: scope with audience, constraints, and explicit checks.
- Absent or weak rubrics → remedy: adopt 3–5 clear dimensions and numeric scoring.
- Over-reliance on a single pass → remedy: implement at least two iteration stages with exit criteria.
- Unchecked hallucinations → remedy: require citations or explicit uncertainty and verify top claims automatically.
- Model gaming the rubric → remedy: include adversarial tests and periodic human audits.
Implementation checklist
- Define 3–5 critique objectives tied to business or user outcomes.
- Create self-review prompt templates for each objective with explicit asks.
- Build a 3–4 dimension rubric with numeric scoring and thresholds.
- Require step-by-step rationales and evidence or verification actions.
- Design a 2–3 stage iteration loop with clear exit criteria.
- Add adversarial checks and cross-model verification for bias and hallucination.
- Log results and schedule human spot-checks for drift and gaming.
FAQ
- How many rubric dimensions are ideal?
- Three to five dimensions balance coverage and usability — enough to capture major quality axes without overwhelming reviewers.
- Should critiques be automated or human?
- Start with automated critiques for scale and speed, but include periodic human audits for calibration and to catch subtle failures.
- How many iterations are reasonable?
- Typically 2–3 focused iterations work well; more loops add cost and risk of drift unless tightly constrained.
- What if the model refuses to provide sources?
- Require a verification plan instead: search terms, datasets, or an explicit uncertainty flag; escalate to retrieval-based checks or human review if needed.
- How do I measure improvement?
- Use rubric score changes, reduction in factual-error rate, and qualitative human ratings on sampled outputs to track progress.
