Source: Production methodology at major RAG products (Perplexity, ChatGPT, Claude); Anthropic, OpenAI citation documentation; RAG literature 2024–2026

Classification — Pattern for producing user-facing synthesized answers from retrieved passages with verifiable citations to source documents.

Intent

Generate natural-language answers that satisfy informational queries directly while preserving the user\'s ability to verify each claim against source passages through cited references.

Motivating Problem

Generated answers without citations are unverifiable; users can\'t tell which claims are supported and which the LLM invented. Citations restore verifiability but create their own quality concerns: citation drift (citation points to wrong passage), citation invention (citation refers to a non-existent passage), partial citation (claim supported only partially by the cited passage).

Producing reliable RAG output is a discipline involving prompt design, output parsing, and verification. The patterns documented here represent the production consensus through 2024–2026.

How It Works

Prompt structure. The system prompt instructs the LLM to: answer using only the provided passages; cite each claim with the source passage ID; refuse to answer if the passages don\'t support a confident response. The user message contains the query and the formatted passages. Each passage has a clear ID (P1, P2, etc.) that the LLM uses for citation.

Passage formatting. Each passage is formatted with a clear ID, the passage text, and optionally the source document title/URL. Production patterns: place passages between explicit delimiters (XML tags work well: <passage id=\"P1\">...</passage>); order passages by retrieval relevance; cap passage count at 5–10 to keep prompts focused and costs bounded.

Output structure. The LLM produces an answer with inline citations like [P1] [P2]. Production parsers extract these citation markers and link them to the passage IDs. UI then renders citations as clickable references to source documents.

Faithfulness verification. After generation, a verification step checks that each cited claim is actually supported by the cited passage. Production patterns: pattern match cited claims against passages; LLM-based verification (separate LLM call to check each claim); human review for high-stakes domains. The verification stage catches hallucinations that the generation stage missed.

Refusal patterns. The LLM should refuse to answer when passages don\'t support a confident response. Production prompts include explicit instructions: 'If the passages don\'t contain enough information to answer, say so directly.' The refusal pattern is critical for trustworthiness; without it, the LLM tends to fabricate answers when context is insufficient.

Conversation context. Multi-turn RAG maintains conversation history. Production patterns: include last N turns verbatim in the prompt; summarize older turns; query rewriting (Section A) makes the current query self-contained for retrieval; synthesis uses both the rewritten query and the original conversational query so the response feels conversationally appropriate.

Streaming responses. RAG generation has substantial latency (1–3 seconds for typical answers). Streaming the response token-by-token makes the latency more tolerable; users see content appearing immediately. Production patterns: stream the synthesis output to the UI; render citations as they appear; allow user to interrupt or scroll while generation continues.

Cost considerations. Each RAG query has a non-trivial cost. Input tokens (the formatted passages) often dominate — 5 passages of 500 tokens each plus the system prompt is 3000+ input tokens. Output tokens are smaller (typical answers are 200–500 tokens). Production patterns: cache frequent (query, passage-set) combinations; tier models (Haiku for routine queries, Sonnet for harder, Opus for the highest-stakes); enforce per-user or per-session cost budgets.

When to Use It

Informational and analytical query workloads where users want answers, not links. Customer support and knowledge-base search where direct answers reduce support load. Research and analysis tools where synthesis of multiple sources is the value proposition.

Less good fit — navigational queries ('Nike homepage') where users want links not essays. Transactional queries ('buy running shoes') where product listings are the right output. High-stakes domains (medical, legal, financial advice) where hallucination risk is unacceptable without robust verification.

Sources

Anthropic documentation on RAG patterns with Claude
OpenAI cookbook on RAG implementation
Perplexity AI engineering blog posts on citation handling
RAG literature: Gao et al. (2024) survey; Lewis et al. (2020) original RAG paper

Example artifacts

Code

# RAG synthesis with grounded citation (Python + Anthropic SDK)

from typing import List, Dict
import anthropic
import re

client = anthropic.Anthropic()

RAG_SYSTEM_PROMPT = """You are a search assistant. Answer the
user\'s question using ONLY the provided passages.

Rules:
\- Cite each claim with the passage ID in brackets, e.g. [P1]
[P2]
\- If the passages don\'t contain enough information for a confident
answer, say so directly and don\'t fabricate
\- Keep answers concise: 2--4 sentences for simple questions, 1--2
paragraphs for complex ones
\- Don\'t invent citations; only cite passages that actually appear
in the input"""

def format_passages(passages: List[Dict]) -> str:
"""Format retrieved passages for inclusion in the prompt."""
return "\n\n".join(
f\'<passage id="P{i+1}">\n{p["text"]}\n</passage>\'
for i, p in enumerate(passages)
)

def synthesize(query: str, passages: List[Dict]) -> Dict:
"""Generate a RAG answer with citations.
Returns dict with: answer, citations (list of passage IDs cited).
"""
user_message = f"""Passages:

{format_passages(passages)}

Question: {query}

Answer (with citations):"""
try:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1000,
system=RAG_SYSTEM_PROMPT,
messages=[{"role": "user", "content": user_message}]
)
answer = response.content[0].text
# Extract citation IDs (e.g., [P1], [P2])
cited_ids = set(re.findall(r"\[P(\d+)\]", answer))
# Map back to source documents
cited = []
for cid in cited_ids:
idx = int(cid) - 1
if 0 <= idx < len(passages):
cited.append({
"passage_id": f"P{cid}",
"source": passages[idx].get("source_id", ""),
"text": passages[idx]["text"],
})
return {
"answer": answer,
"citations": cited,
}
except Exception as e:
return {
"answer": "I\'m unable to generate an answer right now. Here are
the relevant passages I found:",
"citations": [],
"error": str(e),
"fallback_passages": passages, # UI falls back to passage list
}