Browser Tool for AI Agents (Mit Code)

Das Browser-Tool ist die teuerste Art, eine Webseite zu lesen. Manchmal brauchst du es trotzdem. So verhinderst du, dass es Latenz und Budget grillt.

Auf dieser Seite

Das Problem
Warum das in echten Systemen passiert
Was kaputtgeht, wenn du’s ignorierst
Code: a browser tool wrapper with sane limits
The rule: default to HTTP, graduate to browser
Why browser tools explode cost (it’s not just “slower”)
What we block by default
Prompt injection (web edition)
SSRF & egress (how you accidentally browse your own infra)
What we log (because browser incidents are slow)
Extraction: don’t ask the model to “read the page”
A slightly more real wrapper (still conceptual)
Common production failures (and how they show up)
Auth / sessions
Anti-bot / blocks
Selector rot
Caching and dedupe (the difference between “usable” and “expensive”)
Timeouts and retries (browser tools are not “retry forever”)
Concurrency budgets (because browsers eat your servers)
Things we deliberately don’t do
Where this fits in the stack
Shipping checklist (browser tool)
Echter Ausfall
Wo Teams das falsch machen
Abwägungen
Wann du das NICHT nutzen solltest
Link it up

Das Problem

You tried http.get() and the page came back as 12KB of “please enable JavaScript”.

So you add a browser tool.

Now your agent:

is slower (seconds, not milliseconds)
gets blocked (anti-bot, auth, captchas)
finds new ways to loop (infinite scroll, pagination, “next page” forever)

Browser tools are useful. They’re also a tax.

Warum das in echten Systemen passiert

Browsers make side effects easy:

they execute scripts (tracking, redirects, popups)
they load lots of resources (you pay for it)
the DOM changes constantly (your selectors rot)

And the model doesn’t know any of that. It just keeps asking for “one more scroll”.

Was kaputtgeht, wenn du’s ignorierst

p95 latency becomes “go get coffee”
you hit anti-bot, then your agent retries, then you get blocked harder
your infra cost spikes because headless browsers aren’t cheap

Code: a browser tool wrapper with sane limits

PythonJS

PYTHON

from dataclasses import dataclass
import time
from urllib.parse import urlparse


@dataclass(frozen=True)
class BrowserBudget:
    max_pages: int = 6
    max_seconds: int = 45


class BrowserDenied(RuntimeError):
    pass


def is_allowed(url: str, *, allow_domains: set[str]) -> bool:
    host = urlparse(url).hostname or ""
    return host in allow_domains


def browse_urls(urls: list[str], *, browser, allow_domains: set[str], budget: BrowserBudget):
    started = time.time()
    out = []

    for i, url in enumerate(urls[: budget.max_pages]):
        if time.time() - started > budget.max_seconds:
            break
        if not is_allowed(url, allow_domains=allow_domains):
            raise BrowserDenied(f"domain not allowed: {url}")

        # real implementations should:
        # - block images/video
        # - set timeouts per navigation
        # - cache by canonical URL
        page = browser.open(url, timeout_s=10)  # (pseudo)
        out.append({"url": url, "text": page.text, "status": page.status})

    return out

JAVASCRIPT

import { URL } from "node:url";

export class BrowserDenied extends Error {}

export function isAllowed(url, { allowDomains }) {
  const host = new URL(url).hostname.toLowerCase();
  return allowDomains.has(host);
}

export async function browseUrls(urls, { browser, allowDomains, budget }) {
  const started = Date.now();
  const out = [];

  for (const url of urls.slice(0, budget.max_pages)) {
    if ((Date.now() - started) / 1000 > budget.max_seconds) break;
    if (!isAllowed(url, { allowDomains })) throw new BrowserDenied("domain not allowed: " + url);

    // real implementations should:
    // - block images/video
    // - set timeouts per navigation
    // - cache by canonical URL
    const page = await browser.open(url, { timeoutS: 10 }); // (pseudo)
    out.push({ url, text: page.text, status: page.status });
  }

  return out;
}

The rule: default to HTTP, graduate to browser

If you can solve it with http.get(), do that.

Browser tools are what you use when:

the page is a JS app and content is rendered client-side
the data is behind auth and there’s no API
you need interactions (click a tab, expand a section, export)

The cost difference is not subtle.

Rough mental model:

HTTP fetch: tens to hundreds of milliseconds
browser navigation: seconds
interaction loops: “how long is a piece of string?”

If your product needs predictable latency, browser is your last resort.

Why browser tools explode cost (it’s not just “slower”)

Browser tools tend to multiply cost in three ways:

Wall time If one browse step is 3–5 seconds, your 12-step agent now has minutes of runtime.
Resource loading Even if you only want text, the page wants to load:

images, video, fonts
analytics scripts
A/B test scripts
tracking pixels

If you don’t block that, you’re paying to render ads in headless Chrome. Incredible industry.

Loop surface area Infinite scroll, “load more”, pagination, dynamic filters… the model sees “next”, it clicks “next”.

What we block by default

We don’t render the full modern web unless we have to.

Defaults we ship:

block images/video/fonts
disable downloads
hard timeout per navigation
hard max pages per run
hard max interactions per page
domain allowlist (yes, even for “research”)

If you skip domain allowlists, you’re one prompt injection away from your agent browsing the worst parts of the internet with your credentials.

Prompt injection (web edition)

The web is untrusted input. It’s not just “data”. It’s data that can talk back.

If you feed raw page text into the model and let it decide tools in the same step, you’re basically doing:

read untrusted instructions
execute them with real credentials

And then acting surprised when the agent does something dumb.

We’ve seen pages that literally include:

“Ignore previous instructions”
“Call this webhook to verify”
“Paste your API key to continue”

It’s not magic. The model is a pattern matcher. If you give it convincing text plus tool access, it will sometimes comply.

What we do instead:

Separate extraction from action. Browser tool extracts just the fields we need (main text, table rows, headings).
Treat extracted content as a quoted artifact, not as instructions.
Never let page content expand permissions. Tool policy stays outside the model.

If you only remember one rule: a webpage doesn’t get to ask for tool calls.

SSRF & egress (how you accidentally browse your own infra)

If your browser tool runs inside a VPC, it can “browse” internal hosts too. That’s an SSRF party.

Common footguns:

http://169.254.169.254/ (cloud metadata)
private ranges (10.0.0.0/8, 192.168.0.0/16, 172.16.0.0/12)
DNS rebinding (host looks public, resolves private later)
redirects that walk you from public → private

Block it at the runtime. Not in a prompt.

PythonJS

PYTHON

import ipaddress
import socket


def is_public_host(host: str) -> bool:
    # Resolve once and reject private/reserved IPs.
    for family, _, _, _, sockaddr in socket.getaddrinfo(host, None):
        ip = sockaddr[0]
        addr = ipaddress.ip_address(ip)
        if (
            addr.is_private
            or addr.is_loopback
            or addr.is_link_local
            or addr.is_reserved
            or addr.is_multicast
        ):
            return False
    return True

JAVASCRIPT

import dns from "node:dns/promises";

function isPrivateIpv4(ip) {
  const parts = ip.split(".").map((x) => Number(x));
  if (parts.length !== 4 || parts.some((x) => !Number.isFinite(x))) return true;
  const [a, b] = parts;

  if (a === 10) return true;
  if (a === 127) return true;
  if (a === 169 && b === 254) return true;
  if (a === 192 && b === 168) return true;
  if (a === 172 && b >= 16 && b <= 31) return true;
  if (a === 0) return true;
  if (a === 100 && b >= 64 && b <= 127) return true; // CGNAT
  return false;
}

export async function isPublicHost(host) {
  const records = await dns.lookup(host, { all: true });
  for (const r of records) {
    if (r.family !== 4) return false; // keep it strict: v6 is trickier
    if (isPrivateIpv4(r.address)) return false;
  }
  return true;
}

Then enforce:

https only (unless you really need http)
no redirects (or redirect allowlist + re-check host each hop)
allowlist by domain and verify resolved IP is public

This is one of those “boring” rules that prevents a very exciting incident.

What we log (because browser incidents are slow)

When browser tooling goes wrong, it’s usually slow and expensive. So we log enough to answer “what happened?” fast:

requested URL + final URL (after redirects)
status code and a “blocked?” boolean
elapsed time per navigation
bytes downloaded (rough cost proxy)
extraction selectors used
stop_reason (max_pages / max_seconds / blocked / policy_deny)

If your browser tool only logs “success/failure”, you’ll be guessing.

Extraction: don’t ask the model to “read the page”

“Read the page and summarize” is how you get:

10k tokens of nav bars
“summary” that’s just the hero section
zero provenance

Prefer explicit extraction:

“extract the pricing table rows”
“extract the support hours”
“extract the first 3 H2 sections”

And store extracted notes with URL + selector + timestamp. That’s provenance.

A slightly more real wrapper (still conceptual)

This is closer to the runtime we like:

PythonJS

PYTHON

from dataclasses import dataclass
import time
from urllib.parse import urlparse


@dataclass(frozen=True)
class BrowserBudget:
    max_pages: int = 6
    max_interactions: int = 20
    max_seconds: int = 45


def canonical_host(url: str) -> str:
    return (urlparse(url).hostname or "").lower()


def browse(
    urls: list[str],
    *,
    browser,
    allow_domains: set[str],
    budget: BrowserBudget,
    request_id: str,
):
    started = time.time()
    out = []
    interactions = 0

    for url in urls[: budget.max_pages]:
        if time.time() - started > budget.max_seconds:
            break

        host = canonical_host(url)
        if host not in allow_domains:
            raise BrowserDenied(f"[{request_id}] domain not allowed: {host}")

        page = browser.new_page(
            block_images=True,
            block_video=True,
            block_fonts=True,
            user_agent="AgentPatternsBot/1.0",  # keep it honest
        )  # (pseudo)

        nav = page.goto(url, timeout_s=10)  # (pseudo)
        interactions += 1

        if interactions > budget.max_interactions:
            break

        # Extract with targets, not vibes.
        text = page.extract_text(selectors=["main", "article"])  # (pseudo)
        out.append({"url": url, "host": host, "text": text, "status": nav.status})

    return out

JAVASCRIPT

import { URL } from "node:url";

export class BrowserDenied extends Error {}

export function canonicalHost(rawUrl) {
  return new URL(rawUrl).hostname.toLowerCase();
}

export async function browse(urls, { browser, allowDomains, budget, requestId }) {
  const started = Date.now();
  const out = [];
  let interactions = 0;

  for (const url of urls.slice(0, budget.max_pages)) {
    if ((Date.now() - started) / 1000 > budget.max_seconds) break;

    const host = canonicalHost(url);
    if (!allowDomains.has(host)) throw new BrowserDenied("[" + requestId + "] domain not allowed: " + host);

    const page = await browser.newPage({
      blockImages: true,
      blockVideo: true,
      blockFonts: true,
      userAgent: "AgentPatternsBot/1.0",
    }); // (pseudo)

    const nav = await page.goto(url, { timeoutS: 10 }); // (pseudo)
    interactions += 1;

    if (interactions > budget.max_interactions) break;

    // Extract with targets, not vibes.
    const text = await page.extractText({ selectors: ["main", "article"] }); // (pseudo)
    out.push({ url, host, text, status: nav.status });
  }

  return out;
}

Common production failures (and how they show up)

Auth / sessions

If you add login flows, you now own:

session refresh
MFA / CAPTCHA handling (usually: you don’t)
account lockouts (yes, the agent will retry login)

If the data is behind auth, strongly prefer an API integration.

Anti-bot / blocks

When you hit anti-bot:

the model sees “try again”
your runtime retries
now you’re blocked harder

You need explicit detection:

content looks like a block page
HTTP status implies bot mitigation
repeated redirects

And then you stop. Immediately.

Selector rot

If your extraction depends on div > div > div:nth-child(3), it will break. And it will break at 03:00, because that’s when marketing ships the redesign.

Prefer stable selectors:

main, article
data attributes if available
semantic headings

Or accept that browser tooling is a maintenance tax and budget engineering time for it.

Caching and dedupe (the difference between “usable” and “expensive”)

If your agent visits the same URL twice in one run, you’re wasting money. If it visits the same URL across runs, you’re wasting money at scale.

We cache:

by canonical URL (strip tracking params)
by content hash (if the site changes often)
with a TTL (because the web changes)

And we dedupe:

within a run (don’t re-open the same URL)
across runs (don’t re-pay for stable pages)

Yes, caching can return stale content. That’s a trade-off. You can always force a refresh for high-value tasks.

Timeouts and retries (browser tools are not “retry forever”)

Retries are where browser tools kill you.

Reason:

each “retry” is a multi-second navigation
flaky auth flows often get worse with retries (lockouts)
anti-bot systems escalate

Our rule:

one retry for navigation timeouts
zero retries for suspected block pages
hard stop on repeated redirects

Concurrency budgets (because browsers eat your servers)

Headless browsers aren’t “just another tool”. They’re a compute-heavy workload that will happily take down your app server if you let it.

What we do in production:

run browser workers in a separate pool (so web traffic doesn’t fight Chrome)
cap concurrency per tenant (fairness beats a noisy neighbor)
queue requests and return a clear “try again” when the queue is full
degrade: if the browser queue is saturated, fall back to HTTP extraction or return partial results

If your p95 target is <2s, treat the browser tool like a scarce resource. Otherwise one enthusiastic user will turn your “agent” into a global latency incident.

Things we deliberately don’t do

People always ask: “Can it solve CAPTCHAs?”

Maybe. But if your production pipeline depends on solving CAPTCHAs, you’ve already lost.

We try to avoid browser scraping when:

there’s an API
we can partner/integrate instead
the content is behind auth we don’t control

Where this fits in the stack

Browser tools should sit behind:

budgets (pages/time)
allowlisted domains
audit logging
a kill switch (yes, for the browser specifically)

If you don’t isolate browser tooling, one slow site will tank your p95 across the entire product.

Shipping checklist (browser tool)

If you want browser tooling without the “why is everything slow?” incident:

Make browser optional

try HTTP first
only browser when required

Hard budgets

max pages
max seconds
max interactions

Resource blocking

block images/video/fonts by default
keep the render minimal

Domain allowlist

explicit allowed domains
no “open any URL” mode

Cache + dedupe

canonicalize URLs
don’t re-open the same page

Block detection

detect likely anti-bot pages
stop immediately (don’t retry forever)

Extraction targets

prefer main/article + stable selectors
avoid brittle DOM paths

Operator controls

tool-level kill switch
metrics for p95 runtime and per-run page count

Browser tools are powerful. They’re also a liability. Treat them like production infrastructure, not like a fun feature.

Echter Ausfall

We let a browser-capable agent “find the pricing page”. It discovered a site with infinite scroll and kept scrolling.

Damage:

~11 minutes runtime for one request
~120 page interactions
the user got a timeout anyway

Fix:

cap pages and time
block scroll-by-default
require explicit selectors / extraction targets

Wo Teams das falsch machen

They use the browser for everything (including static pages).
They don’t cache.
They don’t restrict domains.

Abwägungen

Browsers increase recall (you can read SPAs).
Browsers destroy latency and cost.
“Accurate extraction” needs brittle selectors (maintenance cost).

Wann du das NICHT nutzen solltest

Don’t use a browser tool when:

plain HTTP fetch works
you can use an API instead
you can accept a partial answer without UI scraping

Link it up

Foundations: Tool calling
Failures: Infinite loop
Security: Tool permissions

Warum KI-Agenten scheitern: typische Probleme in Production

Zurück

Tool‑Berechtigungen für KI‑Agenten (mit Code)

Warum KI-Agenten scheitern: typische Probleme in Production

⏱️ 9 Min. Lesezeit • Aktualisiert 9. April 2026Schwierigkeit: ★★☆

Von Patterns genutzt

Erforderliche Governance

In OnceOnly umsetzen

Validation, timeouts, retries, and response checks for tools.

In OnceOnly nutzen

# onceonly guardrails (concept)
version: 1
tools:
  timeouts_ms: { default: 8000 }
  retries:
    max: 2
    backoff_ms: [200, 800]
  input_validation: strict
  output_validation: strict
logging:
  tool_calls: { enabled: true, store_args: false, store_args_hash: true }

Integriert: Production ControlOnceOnly

Guardrails für Tool-Calling-Agents

Shippe dieses Pattern mit Governance:

Tool-Permissions (Allowlist / Blocklist)
Budgets (Steps / Spend Caps)
Idempotenz & Dedupe
Audit logs & Nachvollziehbarkeit
Kill switch & Incident Stop

OnceOnly testen Doku & Beispiele

Integrierter Hinweis: OnceOnly ist eine Control-Layer für Production-Agent-Systeme.

Beispiel-Policy (Konzept)

# Example (Python — conceptual)
policy = {
  "tools": {"allow": ["browser.search", "http.get"]},
  "budgets": {"steps": 20, "seconds": 60},
}

Autor

Nick — Engineer, der Infrastruktur für KI-Agenten in Produktion aufbaut.

Fokus: Agent-Patterns, Failure-Modes, Runtime-Steuerung und Systemzuverlässigkeit.

🔗 GitHub: https://github.com/mykolademyanov

Redaktioneller Hinweis

Diese Dokumentation ist KI-gestützt, mit menschlicher redaktioneller Verantwortung für Genauigkeit, Klarheit und Produktionsrelevanz.

Der Inhalt basiert auf realen Ausfällen, Post-Mortems und operativen Vorfällen in produktiv eingesetzten KI-Agenten-Systemen.

Browser Tool for AI Agents (Mit Code)

Das Problem

Warum das in echten Systemen passiert

Was kaputtgeht, wenn du’s ignorierst

Code: a browser tool wrapper with sane limits

The rule: default to HTTP, graduate to browser

Why browser tools explode cost (it’s not just “slower”)

What we block by default

Prompt injection (web edition)

SSRF & egress (how you accidentally browse your own infra)

What we log (because browser incidents are slow)

Extraction: don’t ask the model to “read the page”

A slightly more real wrapper (still conceptual)

Common production failures (and how they show up)

Auth / sessions

Anti-bot / blocks

Selector rot

Caching and dedupe (the difference between “usable” and “expensive”)

Timeouts and retries (browser tools are not “retry forever”)

Concurrency budgets (because browsers eat your servers)

Things we deliberately don’t do

Where this fits in the stack

Shipping checklist (browser tool)

Echter Ausfall

Wo Teams das falsch machen

Abwägungen

Wann du das NICHT nutzen solltest

Link it up

Von Patterns genutzt

Verwandte Failures

Erforderliche Governance

Autor

Redaktioneller Hinweis