Specify the JSON schema and output constraints
When instructing an LLM to return structured JSON, the prompt must combine a clear schema, explicit format rules, and test cases. This reduces ambiguous outputs, eases validation, and makes downstream parsing robust.
- TL;DR: craft a precise JSON schema, require the model to output only JSON, show good/bad examples, add validation rules, and iterate using tests.
- Use explicit types, enums, and constraints in the schema so the model’s structure is predictable.
- Protect parsing by using delimiters, tags, and post-processing checks; score outputs and refine prompts.
Quick answer (one concise paragraph)
Provide a complete JSON Schema (draft-07 or later), demand the model output only the JSON that conforms to that schema, include canonical examples and counterexamples, and add parsing-safe delimiters and simple post-validation steps to catch and correct deviations.
Instruct the model to output only JSON and format rules
Make the first instruction in the prompt a non-negotiable requirement: “Output ONLY valid JSON that matches the schema below. Do not include any extra text, markdown, or commentary.” Follow with format rules that cover whitespace, canonical ordering (if needed), and whether to include trailing commas.
- State the JSON Schema version: e.g., draft-07 or 2019-09.
- Specify top-level type:
object,array, etc. - Mandate no surrounding backticks, code fences, or prose.
- Clarify optional formatting preferences (pretty vs compact) and provide a machine-parseable choice.
Example format rules to include in the prompt:
1) Output only valid JSON.
2) Use the provided schema for validation.
3) Do not include comments or additional keys.
4) Use ISO 8601 for dates (e.g., 2023-05-15T13:00:00Z).
5) Return compact JSON unless "pretty": true is provided.Provide canonical examples and edge-case instances
Examples teach the model exactly what valid and invalid outputs look like. For each required structure, include at least one correct example and one incorrect example showing common mistakes.
| Type | Valid example | Invalid example |
|---|---|---|
| Simple object | {"id":123,"name":"Acme"} | {id:123, name:"Acme"} (unquoted keys) |
| Date field | {"created_at":"2024-01-01T00:00:00Z"} | {"created_at":"1/1/24"} (non-ISO) |
Edge-case instances to include:
- Missing optional keys (show allowed omission).
- Null values vs absent keys (clarify which you accept).
- Maximum/minimum numeric bounds and overflow cases.
- Empty arrays, nested empty objects, and large arrays truncation rules.
Require explicit types, enums, and validation rules
Every property should have an explicit type. Use enum where possible, add minLength/maxLength, numeric bounds, and pattern regex for strings.
{
"$schema":"http://json-schema.org/draft-07/schema#",
"type":"object",
"required":["id","status"],
"properties":{
"id":{"type":"integer","minimum":1},
"status":{"type":"string","enum":["pending","complete","failed"]},
"notes":{"type":"string","maxLength":500},
"tags":{"type":"array","items":{"type":"string","pattern":"^[a-z0-9\\-]+$"}}
},
"additionalProperties":false
}Set additionalProperties: false unless you allow extensions. Use oneOf/anyOf for conditional shapes and dependencies for cross-field requirements.
Use delimiters, tags, and post-processing safeguards
Wrap the model JSON with clear delimiters so parsers can extract content reliably. Use a single-line tag before and after the JSON block to detect stray text.
### BEGIN_JSON
{ ... }
### END_JSON- Require the exact delimiter strings in the prompt and examples.
- On the client side, extract the first JSON-looking substring between delimiters and validate it.
- Apply a strict JSON parser, then run schema validation. If validation fails, run a deterministic repair step (see below).
Simple post-processing safeguards:
- Strip leading/trailing text outside delimiters.
- Reject outputs containing common non-JSON tokens (e.g., “—”, “Note:”, “”).
- Auto-correct trivial issues: convert single quotes to double quotes, remove trailing commas, ensure keys are quoted — but only as a last resort and with logging.
Iterate tests, score outputs, and refine prompts
Treat prompt design like software: run a test suite, score outputs against the schema and behavioral checks, then refine prompts based on failure modes.
- Create a test corpus of inputs that exercise normal and edge cases.
- For each output, compute a pass/fail and granular metrics: parse success, schema validity, extraneous text, field-level accuracy.
- Log failures with input, model output, and validation errors to identify patterns.
- Refine the prompt to target the top failure modes — add examples, tighten constraints, or change delimiter strategy.
| Metric | Description | Weight |
|---|---|---|
| Parse success | JSON parses without syntax errors | 40% |
| Schema valid | Conforms to JSON Schema | 40% |
| No extras | No extra prose outside delimiters | 10% |
| Field accuracy | Values match expected patterns | 10% |
Common pitfalls and how to avoid them
- Model adds prose before/after JSON — Remedy: require exact delimiters and include negative examples showing that prose is wrong.
- Unquoted keys or single quotes used — Remedy: include counterexamples and enforce a strict parser with auto-repair fallback.
- Extra unexpected keys — Remedy: set
additionalProperties:falseand show examples of rejections. - Non-ISO date formats — Remedy: require ISO 8601 in the schema and examples; validate dates after parsing.
- Large arrays truncated or summarized — Remedy: specify max/min array lengths and instruct the model to return full arrays or use paging.
- Ambiguous optional fields (null vs absent) — Remedy: state whether null is allowed and provide both examples.
Implementation checklist
- Define JSON Schema with explicit types, enums, patterns, and
additionalPropertiesrules. - Create clear prompt: require “Output ONLY valid JSON” and include delimiters.
- Provide canonical valid and invalid examples inside the prompt.
- Build a test suite with normal and edge-case inputs.
- Implement extraction between delimiters, strict parsing, and schema validation.
- Log failures, score outputs, and iterate prompt refinements.
- Provide fallback auto-repair for trivial syntax issues, but prefer regenerating the response.
FAQ
- Q: Which JSON Schema draft should I use?
- A: Use a recent stable draft (draft-07 or later) compatible with your validators; include the
$schemafield in the prompt. - Q: Should I allow
additionalProperties? - A: Prefer
falseto avoid surprises; allow it only if you document and validate extensions separately. - Q: What if the model repeatedly returns prose?
- A: Add stronger negative examples, require exact delimiters, and implement automatic rejection and regeneration with a tighter prompt.
- Q: Can I ask for pretty-printed JSON?
- A: Yes — specify pretty vs compact in the prompt and include a sample. Compact is easier to parse; pretty is human-friendly.
- Q: How to handle very large outputs?
- A: Use paging (requests for chunks), streaming with reconciliation, or request summarized metadata plus a link to retrieve full data.
