Hallucinated Sources: When Agents Invent Sources

Hallucinated sources happen when an agent cites documents, links, or facts that do not actually exist. Learn why it happens and how to detect it.

On this page

The Problem
Why This Happens
Most Common Failure Patterns
Unfetched URL citations
Snippet instead of evidence (Search-as-evidence)
Citation drift across steps
Pseudo-citations without claim coverage (Claim-source mismatch)
How To Detect These Problems
How To Distinguish Hallucinated Sources From Just An Inaccurate Answer
How To Stop These Failures
Where This Is Implemented In Architecture
Self-check
FAQ
Related Pages

The Problem

The request looks standard: produce a short summary of policy changes and add sources.

Traces show something else: in one run the agent returned 7 citations, but verification showed 3 sources were never fetched, and 2 point to 404. For the user, the answer looks confident, but it is not reproducible.

The system does not crash.

It just returns plausible citations without real evidence.

Analogy: imagine an auditor who references "folders in the archive" that nobody has ever seen. The report looks professional until someone checks the sources. Hallucinated sources in agent systems work the same way.

Why This Happens

Hallucinated sources usually appear not because of one model mistake, but because citation control in runtime is not strict.

LLM has a strong bias toward "complete" answers, so without strict verification the model is more likely to invent a citation than return an answer without a source.

In production, it is typically this:

the agent generates citations as part of a "complete" answer;
search snippets are treated as evidence even though pages were never opened;
source_id values are not tied to evidence snapshots;
without citation verification, runtime passes unfetched or invalid sources;
if fail-closed is not configured, invented sources reach the user.

In trace this appears as growth in citations_count while citation_validity_rate drops.

The problem is not one bad URL.

Runtime does not block unverified citations before final output.

Most Common Failure Patterns

In production, four recurring hallucinated-source patterns appear most often.

Unfetched URL citations

The agent cites a URL that never went through http.get or kb.read in that run.

Typical cause: citations are not restricted to source_id from evidence store.

Snippet instead of evidence (Search-as-evidence)

The answer includes "sources" from search results, but the agent has no confirmation of actual page content.

Typical cause: search results are mixed with the evidence layer.

Citation drift across steps

At an earlier step the source was valid, but after retry or truncation the final answer references another document.

Typical cause: no stable claim -> source_id -> snapshot hash binding.

Pseudo-citations without claim coverage (Claim-source mismatch)

The answer contains a citation block, but key claims have no supporting source.

Typical cause: validation checks only "presence of links", not claim coverage.

How To Detect These Problems

Hallucinated sources are visible through citation and retrieval metrics together.

Metric	Hallucinated sources signal	What to do
`citation_validity_rate`	share of verified citations drops	introduce fail-closed verification by `source_id`
`unfetched_source_rate`	many unfetched URLs in answers	forbid URL citations without evidence snapshot
`source_404_rate`	some sources cannot be opened	check response status and canonical URL during fetch
`claim_without_citation_rate`	claims are not linked to sources	add claim-level coverage check
`citation_stop_reason_rate`	frequent `citations:invalid` in runtime	review retrieval quality and tool policy

How To Distinguish Hallucinated Sources From Just An Inaccurate Answer

Not every text inaccuracy means invented sources. The key question: can the source be technically reproduced for each critical claim.

Normal if:

each citation points to source_id that exists in evidence store;
snapshot metadata exists (URL, timestamp, hash);
claim checks show sources cover key conclusions.

Dangerous if:

the answer contains URLs that never appeared in fetch step;
citations are present only "for form" but do not cover main claims;
answers cannot be reproduced at run level (run_id -> source_id -> snapshot).

How To Stop These Failures

In practice, it looks like this:

all sources pass through evidence store (snapshot + hash + timestamp);
the model returns citations only as source_id, not arbitrary URLs;
citation verifier checks that all source_id values exist, were fetched, and are allowed by policy;
if verification fails, runtime returns stop reason and safe fallback.

Minimal guard for citation validation:

PYTHON

from dataclasses import dataclass
import hashlib
import time


@dataclass(frozen=True)
class EvidenceMeta:
    source_id: str
    url: str
    fetched_at: float
    text_sha256: str


class EvidenceStore:
    def __init__(self):
        self.items: dict[str, EvidenceMeta] = {}

    def add_snapshot(self, source_id: str, url: str, text: str) -> None:
        self.items[source_id] = EvidenceMeta(
            source_id=source_id,
            url=url,
            fetched_at=time.time(),
            text_sha256=hashlib.sha256(text.encode("utf-8")).hexdigest(),
        )

    def has(self, source_id: str) -> bool:
        return source_id in self.items


def verify_citations(cited_source_ids: list[str], store: EvidenceStore) -> str | None:
    # cited_source_ids are expected to come from structured output
    if not cited_source_ids:
        return "citations:missing"

    unknown = [sid for sid in cited_source_ids if not store.has(sid)]
    if unknown:
        return "citations:unknown_source_id"

    return None

This is a basic guard. In production, it is usually extended with claim-level coverage check, allowlist for citation tools, and a separate stop reason for unfetched URLs. verify_citations(...) is called before final response rendering, so user never sees invalid sources.

Where This Is Implemented In Architecture

In production, hallucinated-source control is almost always split across three system layers.

Tool Execution Layer handles evidence fetch: response status, URL normalization, snapshots, and hash. If this layer does not store evidence, citations cannot be verified reliably.

Agent Runtime controls structured output, citation verification, stop reasons, and fail-closed fallback. This is where the final decision is made whether answer can be shown to user.

Memory Layer keeps run-to-evidence linkage: run_id, source_id, retention, and reproducibility. Without this layer, teams cannot run a proper incident audit.

Self-check

Quick pre-release check. Tick the items and see the status below.
This is a short sanity check, not a formal audit.

Citations are returned via source_id, not arbitrary URLs
Evidence snapshots include URL, timestamp, and hash
A snippet without fetch is not treated as evidence
Claim coverage checks are enabled for critical answers
There is fail-closed or controlled fallback for citations
Logs include run_id, source_id, and stop reason
There is a retention policy for evidence snapshots
There are alerts for citation_validity and unfetched_source

Progress: 0/8

⚠ There are risk signals

Basic controls are missing. Close the key checklist points before release.

FAQ

Q: Can I just ask the model to "always include sources"?
A: You can, but it is not enough. Without runtime citation verification, this is formatting, not evidence.

Q: Can search results be used as evidence?
A: Usually no. Search gives candidate sources only. Evidence is only what was fetched and stored as a snapshot.

Q: Do I need to store the full source text?
A: Not always. Minimum for audit is URL, timestamp, hash, and stable source_id. Full text is added where replay or exact quotes are needed.

Q: What should user see when citations are invalid?
A: Explicit stop reason, what was already verified, and a safe next step: partial answer without unverified sources, or rerun with verification.

Hallucinated-sources incidents almost never look like a loud crash. It is a silent trust loss that is usually noticed only after source checks. So production agents need not only good answers, but strict citation discipline.

If this happens in production, these pages are also useful:

Why AI agents fail - general map of production failures.
Context poisoning - how problematic context pushes agents to wrong conclusions.
Tool failure - how unstable tools break the evidence chain.
Agent Runtime - where to enforce structured output verification and stop reasons.
Tool Execution Layer - where to collect snapshots and validate sources.
Memory Layer - where to keep evidence reproducibility across runs.

Used by patterns

Related failures

Governance required

Implement in OnceOnly

Guardrails for loops, retries, and spend escalation.

Use in OnceOnly

# onceonly guardrails (concept)
version: 1
budgets:
  max_steps: 25
  max_tool_calls: 12
  max_seconds: 60
  max_usd: 1.00
policy:
  tool_allowlist:
    - search.read
    - http.get
controls:
  loop_detection:
    enabled: true
    dedupe_by: [tool, args_hash]
  retries:
    max: 2
    backoff_ms: [200, 800]
stop_reasons:
  enabled: true
logging:
  tool_calls: { enabled: true, store_args: false, store_args_hash: true }

Integrated: production controlOnceOnly

Add guardrails to tool-calling agents

Ship this pattern with governance:

Budgets (steps / spend caps)
Kill switch & incident stop
Audit logs & traceability
Idempotency & dedupe
Tool permissions (allowlist / blocklist)

Try OnceOnly Docs & examples

Integrated mention: OnceOnly is a control layer for production agent systems.

Example policy (concept)

# Example (Python — conceptual)
policy = {
  "budgets": {"steps": 20, "seconds": 60, "usd": 1.0},
  "controls": {"kill_switch": True, "audit": True},
}

Author

Nick — engineer building infrastructure for production AI agents.

Focus: agent patterns, failure modes, runtime control, and system reliability.

🔗 GitHub: https://github.com/mykolademyanov

Editorial note

This documentation is AI-assisted, with human editorial responsibility for accuracy, clarity, and production relevance.

Content is grounded in real-world failures, post-mortems, and operational incidents in deployed AI agent systems.