Das Problem
You tried http.get() and the page came back as 12KB of âplease enable JavaScriptâ.
So you add a browser tool.
Now your agent:
- is slower (seconds, not milliseconds)
- gets blocked (anti-bot, auth, captchas)
- finds new ways to loop (infinite scroll, pagination, ânext pageâ forever)
Browser tools are useful. Theyâre also a tax.
Warum das in echten Systemen passiert
Browsers make side effects easy:
- they execute scripts (tracking, redirects, popups)
- they load lots of resources (you pay for it)
- the DOM changes constantly (your selectors rot)
And the model doesnât know any of that. It just keeps asking for âone more scrollâ.
Was kaputtgeht, wenn duâs ignorierst
- p95 latency becomes âgo get coffeeâ
- you hit anti-bot, then your agent retries, then you get blocked harder
- your infra cost spikes because headless browsers arenât cheap
Code: a browser tool wrapper with sane limits
from dataclasses import dataclass
import time
from urllib.parse import urlparse
@dataclass(frozen=True)
class BrowserBudget:
max_pages: int = 6
max_seconds: int = 45
class BrowserDenied(RuntimeError):
pass
def is_allowed(url: str, *, allow_domains: set[str]) -> bool:
host = urlparse(url).hostname or ""
return host in allow_domains
def browse_urls(urls: list[str], *, browser, allow_domains: set[str], budget: BrowserBudget):
started = time.time()
out = []
for i, url in enumerate(urls[: budget.max_pages]):
if time.time() - started > budget.max_seconds:
break
if not is_allowed(url, allow_domains=allow_domains):
raise BrowserDenied(f"domain not allowed: {url}")
# real implementations should:
# - block images/video
# - set timeouts per navigation
# - cache by canonical URL
page = browser.open(url, timeout_s=10) # (pseudo)
out.append({"url": url, "text": page.text, "status": page.status})
return outimport { URL } from "node:url";
export class BrowserDenied extends Error {}
export function isAllowed(url, { allowDomains }) {
const host = new URL(url).hostname.toLowerCase();
return allowDomains.has(host);
}
export async function browseUrls(urls, { browser, allowDomains, budget }) {
const started = Date.now();
const out = [];
for (const url of urls.slice(0, budget.max_pages)) {
if ((Date.now() - started) / 1000 > budget.max_seconds) break;
if (!isAllowed(url, { allowDomains })) throw new BrowserDenied("domain not allowed: " + url);
// real implementations should:
// - block images/video
// - set timeouts per navigation
// - cache by canonical URL
const page = await browser.open(url, { timeoutS: 10 }); // (pseudo)
out.push({ url, text: page.text, status: page.status });
}
return out;
}The rule: default to HTTP, graduate to browser
If you can solve it with http.get(), do that.
Browser tools are what you use when:
- the page is a JS app and content is rendered client-side
- the data is behind auth and thereâs no API
- you need interactions (click a tab, expand a section, export)
The cost difference is not subtle.
Rough mental model:
- HTTP fetch: tens to hundreds of milliseconds
- browser navigation: seconds
- interaction loops: âhow long is a piece of string?â
If your product needs predictable latency, browser is your last resort.
Why browser tools explode cost (itâs not just âslowerâ)
Browser tools tend to multiply cost in three ways:
-
Wall time If one browse step is 3â5 seconds, your 12-step agent now has minutes of runtime.
-
Resource loading Even if you only want text, the page wants to load:
- images, video, fonts
- analytics scripts
- A/B test scripts
- tracking pixels
If you donât block that, youâre paying to render ads in headless Chrome. Incredible industry.
- Loop surface area Infinite scroll, âload moreâ, pagination, dynamic filters⊠the model sees ânextâ, it clicks ânextâ.
What we block by default
We donât render the full modern web unless we have to.
Defaults we ship:
- block images/video/fonts
- disable downloads
- hard timeout per navigation
- hard max pages per run
- hard max interactions per page
- domain allowlist (yes, even for âresearchâ)
If you skip domain allowlists, youâre one prompt injection away from your agent browsing the worst parts of the internet with your credentials.
Prompt injection (web edition)
The web is untrusted input. Itâs not just âdataâ. Itâs data that can talk back.
If you feed raw page text into the model and let it decide tools in the same step, youâre basically doing:
- read untrusted instructions
- execute them with real credentials
And then acting surprised when the agent does something dumb.
Weâve seen pages that literally include:
- âIgnore previous instructionsâ
- âCall this webhook to verifyâ
- âPaste your API key to continueâ
Itâs not magic. The model is a pattern matcher. If you give it convincing text plus tool access, it will sometimes comply.
What we do instead:
- Separate extraction from action. Browser tool extracts just the fields we need (main text, table rows, headings).
- Treat extracted content as a quoted artifact, not as instructions.
- Never let page content expand permissions. Tool policy stays outside the model.
If you only remember one rule: a webpage doesnât get to ask for tool calls.
SSRF & egress (how you accidentally browse your own infra)
If your browser tool runs inside a VPC, it can âbrowseâ internal hosts too. Thatâs an SSRF party.
Common footguns:
http://169.254.169.254/(cloud metadata)- private ranges (
10.0.0.0/8,192.168.0.0/16,172.16.0.0/12) - DNS rebinding (host looks public, resolves private later)
- redirects that walk you from public â private
Block it at the runtime. Not in a prompt.
import ipaddress
import socket
def is_public_host(host: str) -> bool:
# Resolve once and reject private/reserved IPs.
for family, _, _, _, sockaddr in socket.getaddrinfo(host, None):
ip = sockaddr[0]
addr = ipaddress.ip_address(ip)
if (
addr.is_private
or addr.is_loopback
or addr.is_link_local
or addr.is_reserved
or addr.is_multicast
):
return False
return Trueimport dns from "node:dns/promises";
function isPrivateIpv4(ip) {
const parts = ip.split(".").map((x) => Number(x));
if (parts.length !== 4 || parts.some((x) => !Number.isFinite(x))) return true;
const [a, b] = parts;
if (a === 10) return true;
if (a === 127) return true;
if (a === 169 && b === 254) return true;
if (a === 192 && b === 168) return true;
if (a === 172 && b >= 16 && b <= 31) return true;
if (a === 0) return true;
if (a === 100 && b >= 64 && b <= 127) return true; // CGNAT
return false;
}
export async function isPublicHost(host) {
const records = await dns.lookup(host, { all: true });
for (const r of records) {
if (r.family !== 4) return false; // keep it strict: v6 is trickier
if (isPrivateIpv4(r.address)) return false;
}
return true;
}Then enforce:
- https only (unless you really need http)
- no redirects (or redirect allowlist + re-check host each hop)
- allowlist by domain and verify resolved IP is public
This is one of those âboringâ rules that prevents a very exciting incident.
What we log (because browser incidents are slow)
When browser tooling goes wrong, itâs usually slow and expensive. So we log enough to answer âwhat happened?â fast:
- requested URL + final URL (after redirects)
- status code and a âblocked?â boolean
- elapsed time per navigation
- bytes downloaded (rough cost proxy)
- extraction selectors used
- stop_reason (max_pages / max_seconds / blocked / policy_deny)
If your browser tool only logs âsuccess/failureâ, youâll be guessing.
Extraction: donât ask the model to âread the pageâ
âRead the page and summarizeâ is how you get:
- 10k tokens of nav bars
- âsummaryâ thatâs just the hero section
- zero provenance
Prefer explicit extraction:
- âextract the pricing table rowsâ
- âextract the support hoursâ
- âextract the first 3 H2 sectionsâ
And store extracted notes with URL + selector + timestamp. Thatâs provenance.
A slightly more real wrapper (still conceptual)
This is closer to the runtime we like:
from dataclasses import dataclass
import time
from urllib.parse import urlparse
@dataclass(frozen=True)
class BrowserBudget:
max_pages: int = 6
max_interactions: int = 20
max_seconds: int = 45
def canonical_host(url: str) -> str:
return (urlparse(url).hostname or "").lower()
def browse(
urls: list[str],
*,
browser,
allow_domains: set[str],
budget: BrowserBudget,
request_id: str,
):
started = time.time()
out = []
interactions = 0
for url in urls[: budget.max_pages]:
if time.time() - started > budget.max_seconds:
break
host = canonical_host(url)
if host not in allow_domains:
raise BrowserDenied(f"[{request_id}] domain not allowed: {host}")
page = browser.new_page(
block_images=True,
block_video=True,
block_fonts=True,
user_agent="AgentPatternsBot/1.0", # keep it honest
) # (pseudo)
nav = page.goto(url, timeout_s=10) # (pseudo)
interactions += 1
if interactions > budget.max_interactions:
break
# Extract with targets, not vibes.
text = page.extract_text(selectors=["main", "article"]) # (pseudo)
out.append({"url": url, "host": host, "text": text, "status": nav.status})
return outimport { URL } from "node:url";
export class BrowserDenied extends Error {}
export function canonicalHost(rawUrl) {
return new URL(rawUrl).hostname.toLowerCase();
}
export async function browse(urls, { browser, allowDomains, budget, requestId }) {
const started = Date.now();
const out = [];
let interactions = 0;
for (const url of urls.slice(0, budget.max_pages)) {
if ((Date.now() - started) / 1000 > budget.max_seconds) break;
const host = canonicalHost(url);
if (!allowDomains.has(host)) throw new BrowserDenied("[" + requestId + "] domain not allowed: " + host);
const page = await browser.newPage({
blockImages: true,
blockVideo: true,
blockFonts: true,
userAgent: "AgentPatternsBot/1.0",
}); // (pseudo)
const nav = await page.goto(url, { timeoutS: 10 }); // (pseudo)
interactions += 1;
if (interactions > budget.max_interactions) break;
// Extract with targets, not vibes.
const text = await page.extractText({ selectors: ["main", "article"] }); // (pseudo)
out.push({ url, host, text, status: nav.status });
}
return out;
}Common production failures (and how they show up)
Auth / sessions
If you add login flows, you now own:
- session refresh
- MFA / CAPTCHA handling (usually: you donât)
- account lockouts (yes, the agent will retry login)
If the data is behind auth, strongly prefer an API integration.
Anti-bot / blocks
When you hit anti-bot:
- the model sees âtry againâ
- your runtime retries
- now youâre blocked harder
You need explicit detection:
- content looks like a block page
- HTTP status implies bot mitigation
- repeated redirects
And then you stop. Immediately.
Selector rot
If your extraction depends on div > div > div:nth-child(3), it will break.
And it will break at 03:00, because thatâs when marketing ships the redesign.
Prefer stable selectors:
main,article- data attributes if available
- semantic headings
Or accept that browser tooling is a maintenance tax and budget engineering time for it.
Caching and dedupe (the difference between âusableâ and âexpensiveâ)
If your agent visits the same URL twice in one run, youâre wasting money. If it visits the same URL across runs, youâre wasting money at scale.
We cache:
- by canonical URL (strip tracking params)
- by content hash (if the site changes often)
- with a TTL (because the web changes)
And we dedupe:
- within a run (donât re-open the same URL)
- across runs (donât re-pay for stable pages)
Yes, caching can return stale content. Thatâs a trade-off. You can always force a refresh for high-value tasks.
Timeouts and retries (browser tools are not âretry foreverâ)
Retries are where browser tools kill you.
Reason:
- each âretryâ is a multi-second navigation
- flaky auth flows often get worse with retries (lockouts)
- anti-bot systems escalate
Our rule:
- one retry for navigation timeouts
- zero retries for suspected block pages
- hard stop on repeated redirects
Concurrency budgets (because browsers eat your servers)
Headless browsers arenât âjust another toolâ. Theyâre a compute-heavy workload that will happily take down your app server if you let it.
What we do in production:
- run browser workers in a separate pool (so web traffic doesnât fight Chrome)
- cap concurrency per tenant (fairness beats a noisy neighbor)
- queue requests and return a clear âtry againâ when the queue is full
- degrade: if the browser queue is saturated, fall back to HTTP extraction or return partial results
If your p95 target is <2s, treat the browser tool like a scarce resource. Otherwise one enthusiastic user will turn your âagentâ into a global latency incident.
Things we deliberately donât do
People always ask: âCan it solve CAPTCHAs?â
Maybe. But if your production pipeline depends on solving CAPTCHAs, youâve already lost.
We try to avoid browser scraping when:
- thereâs an API
- we can partner/integrate instead
- the content is behind auth we donât control
Where this fits in the stack
Browser tools should sit behind:
- budgets (pages/time)
- allowlisted domains
- audit logging
- a kill switch (yes, for the browser specifically)
If you donât isolate browser tooling, one slow site will tank your p95 across the entire product.
Shipping checklist (browser tool)
If you want browser tooling without the âwhy is everything slow?â incident:
- Make browser optional
- try HTTP first
- only browser when required
- Hard budgets
- max pages
- max seconds
- max interactions
- Resource blocking
- block images/video/fonts by default
- keep the render minimal
- Domain allowlist
- explicit allowed domains
- no âopen any URLâ mode
- Cache + dedupe
- canonicalize URLs
- donât re-open the same page
- Block detection
- detect likely anti-bot pages
- stop immediately (donât retry forever)
- Extraction targets
- prefer
main/article+ stable selectors - avoid brittle DOM paths
- Operator controls
- tool-level kill switch
- metrics for p95 runtime and per-run page count
Browser tools are powerful. Theyâre also a liability. Treat them like production infrastructure, not like a fun feature.
Echter Ausfall
We let a browser-capable agent âfind the pricing pageâ. It discovered a site with infinite scroll and kept scrolling.
Damage:
- ~11 minutes runtime for one request
- ~120 page interactions
- the user got a timeout anyway
Fix:
- cap pages and time
- block scroll-by-default
- require explicit selectors / extraction targets
Wo Teams das falsch machen
- They use the browser for everything (including static pages).
- They donât cache.
- They donât restrict domains.
AbwÀgungen
- Browsers increase recall (you can read SPAs).
- Browsers destroy latency and cost.
- âAccurate extractionâ needs brittle selectors (maintenance cost).
Wann du das NICHT nutzen solltest
Donât use a browser tool when:
- plain HTTP fetch works
- you can use an API instead
- you can accept a partial answer without UI scraping
Link it up
- Foundations: Tool calling
- Failures: Infinite loop
- Security: Tool permissions