Hallucinated Sources: When Agents Invent Sources

Hallucinated sources happen when an agent cites documents, links, or facts that do not actually exist. Learn why it happens and how to detect it.
On this page
  1. The Problem
  2. Why This Happens
  3. Most Common Failure Patterns
  4. Unfetched URL citations
  5. Snippet instead of evidence (Search-as-evidence)
  6. Citation drift across steps
  7. Pseudo-citations without claim coverage (Claim-source mismatch)
  8. How To Detect These Problems
  9. How To Distinguish Hallucinated Sources From Just An Inaccurate Answer
  10. How To Stop These Failures
  11. Where This Is Implemented In Architecture
  12. Self-check
  13. FAQ
  14. Related Pages

The Problem

The request looks standard: produce a short summary of policy changes and add sources.

Traces show something else: in one run the agent returned 7 citations, but verification showed 3 sources were never fetched, and 2 point to 404. For the user, the answer looks confident, but it is not reproducible.

The system does not crash.

It just returns plausible citations without real evidence.

Analogy: imagine an auditor who references "folders in the archive" that nobody has ever seen. The report looks professional until someone checks the sources. Hallucinated sources in agent systems work the same way.

Why This Happens

Hallucinated sources usually appear not because of one model mistake, but because citation control in runtime is not strict.

LLM has a strong bias toward "complete" answers, so without strict verification the model is more likely to invent a citation than return an answer without a source.

In production, it is typically this:

  1. the agent generates citations as part of a "complete" answer;
  2. search snippets are treated as evidence even though pages were never opened;
  3. source_id values are not tied to evidence snapshots;
  4. without citation verification, runtime passes unfetched or invalid sources;
  5. if fail-closed is not configured, invented sources reach the user.

In trace this appears as growth in citations_count while citation_validity_rate drops.

The problem is not one bad URL.

Runtime does not block unverified citations before final output.

Most Common Failure Patterns

In production, four recurring hallucinated-source patterns appear most often.

Unfetched URL citations

The agent cites a URL that never went through http.get or kb.read in that run.

Typical cause: citations are not restricted to source_id from evidence store.

Snippet instead of evidence (Search-as-evidence)

The answer includes "sources" from search results, but the agent has no confirmation of actual page content.

Typical cause: search results are mixed with the evidence layer.

Citation drift across steps

At an earlier step the source was valid, but after retry or truncation the final answer references another document.

Typical cause: no stable claim -> source_id -> snapshot hash binding.

Pseudo-citations without claim coverage (Claim-source mismatch)

The answer contains a citation block, but key claims have no supporting source.

Typical cause: validation checks only "presence of links", not claim coverage.

How To Detect These Problems

Hallucinated sources are visible through citation and retrieval metrics together.

MetricHallucinated sources signalWhat to do
citation_validity_rateshare of verified citations dropsintroduce fail-closed verification by source_id
unfetched_source_ratemany unfetched URLs in answersforbid URL citations without evidence snapshot
source_404_ratesome sources cannot be openedcheck response status and canonical URL during fetch
claim_without_citation_rateclaims are not linked to sourcesadd claim-level coverage check
citation_stop_reason_ratefrequent citations:invalid in runtimereview retrieval quality and tool policy

How To Distinguish Hallucinated Sources From Just An Inaccurate Answer

Not every text inaccuracy means invented sources. The key question: can the source be technically reproduced for each critical claim.

Normal if:

  • each citation points to source_id that exists in evidence store;
  • snapshot metadata exists (URL, timestamp, hash);
  • claim checks show sources cover key conclusions.

Dangerous if:

  • the answer contains URLs that never appeared in fetch step;
  • citations are present only "for form" but do not cover main claims;
  • answers cannot be reproduced at run level (run_id -> source_id -> snapshot).

How To Stop These Failures

In practice, it looks like this:

  1. all sources pass through evidence store (snapshot + hash + timestamp);
  2. the model returns citations only as source_id, not arbitrary URLs;
  3. citation verifier checks that all source_id values exist, were fetched, and are allowed by policy;
  4. if verification fails, runtime returns stop reason and safe fallback.

Minimal guard for citation validation:

PYTHON
from dataclasses import dataclass
import hashlib
import time


@dataclass(frozen=True)
class EvidenceMeta:
    source_id: str
    url: str
    fetched_at: float
    text_sha256: str


class EvidenceStore:
    def __init__(self):
        self.items: dict[str, EvidenceMeta] = {}

    def add_snapshot(self, source_id: str, url: str, text: str) -> None:
        self.items[source_id] = EvidenceMeta(
            source_id=source_id,
            url=url,
            fetched_at=time.time(),
            text_sha256=hashlib.sha256(text.encode("utf-8")).hexdigest(),
        )

    def has(self, source_id: str) -> bool:
        return source_id in self.items


def verify_citations(cited_source_ids: list[str], store: EvidenceStore) -> str | None:
    # cited_source_ids are expected to come from structured output
    if not cited_source_ids:
        return "citations:missing"

    unknown = [sid for sid in cited_source_ids if not store.has(sid)]
    if unknown:
        return "citations:unknown_source_id"

    return None

This is a basic guard. In production, it is usually extended with claim-level coverage check, allowlist for citation tools, and a separate stop reason for unfetched URLs. verify_citations(...) is called before final response rendering, so user never sees invalid sources.

Where This Is Implemented In Architecture

In production, hallucinated-source control is almost always split across three system layers.

Tool Execution Layer handles evidence fetch: response status, URL normalization, snapshots, and hash. If this layer does not store evidence, citations cannot be verified reliably.

Agent Runtime controls structured output, citation verification, stop reasons, and fail-closed fallback. This is where the final decision is made whether answer can be shown to user.

Memory Layer keeps run-to-evidence linkage: run_id, source_id, retention, and reproducibility. Without this layer, teams cannot run a proper incident audit.

Self-check

Quick pre-release check. Tick the items and see the status below.
This is a short sanity check, not a formal audit.

Progress: 0/8

⚠ There are risk signals

Basic controls are missing. Close the key checklist points before release.

FAQ

Q: Can I just ask the model to "always include sources"?
A: You can, but it is not enough. Without runtime citation verification, this is formatting, not evidence.

Q: Can search results be used as evidence?
A: Usually no. Search gives candidate sources only. Evidence is only what was fetched and stored as a snapshot.

Q: Do I need to store the full source text?
A: Not always. Minimum for audit is URL, timestamp, hash, and stable source_id. Full text is added where replay or exact quotes are needed.

Q: What should user see when citations are invalid?
A: Explicit stop reason, what was already verified, and a safe next step: partial answer without unverified sources, or rerun with verification.


Hallucinated-sources incidents almost never look like a loud crash. It is a silent trust loss that is usually noticed only after source checks. So production agents need not only good answers, but strict citation discipline.

If this happens in production, these pages are also useful:

⏱️ 7 min read β€’ Updated March 12, 2026Difficulty: β˜…β˜…β˜†
Implement in OnceOnly
Guardrails for loops, retries, and spend escalation.
Use in OnceOnly
# onceonly guardrails (concept)
version: 1
budgets:
  max_steps: 25
  max_tool_calls: 12
  max_seconds: 60
  max_usd: 1.00
policy:
  tool_allowlist:
    - search.read
    - http.get
controls:
  loop_detection:
    enabled: true
    dedupe_by: [tool, args_hash]
  retries:
    max: 2
    backoff_ms: [200, 800]
stop_reasons:
  enabled: true
logging:
  tool_calls: { enabled: true, store_args: false, store_args_hash: true }
Integrated: production controlOnceOnly
Add guardrails to tool-calling agents
Ship this pattern with governance:
  • Budgets (steps / spend caps)
  • Kill switch & incident stop
  • Audit logs & traceability
  • Idempotency & dedupe
  • Tool permissions (allowlist / blocklist)
Integrated mention: OnceOnly is a control layer for production agent systems.
Example policy (concept)
# Example (Python β€” conceptual)
policy = {
  "budgets": {"steps": 20, "seconds": 60, "usd": 1.0},
  "controls": {"kill_switch": True, "audit": True},
}

Author

Nick β€” engineer building infrastructure for production AI agents.

Focus: agent patterns, failure modes, runtime control, and system reliability.

πŸ”— GitHub: https://github.com/mykolademyanov


Editorial note

This documentation is AI-assisted, with human editorial responsibility for accuracy, clarity, and production relevance.

Content is grounded in real-world failures, post-mortems, and operational incidents in deployed AI agent systems.