ProtalBot · Our web crawler

Meet ProtalBot.The crawler behind every audit.

ProtalBot is a well-behaved, identifiable web crawler. This page is the official reference — user-agent string, IP provenance, robots.txt directives, and how to verify, allow, rate-limit, or block it.

Configure robots.txt User-agent IP ranges

ProtalBot

v1.0 · Classifier: Crawler

Operator

Protal · remote

User-agent

ProtalBot/1.0 (+https://protal.ai/bot)

Region

us-east-1 (Virginia)

Honors robots

Yes, immediately

JS rendering

Static HTML pass + Chromium Cat 6

Contact

bot@protal.ai

Operational· 1 region· on-demand only

Verify it's really us

ProtalBot traffic originates from Vercel us-east-1 egress. A list of allowlistable CIDR ranges will publish to /.well-known/protalbot.json before public launch — until then, email us if you see traffic and want to confirm.

# starting in production
host <ip> | grep "vercel"
# ✓ vercel-infrastructure.com

User-agent

One string we send.

ProtalBot identifies itself with a single, stable user-agent. Future fleets (continuous monitoring, replay-as) are planned but not operational — when they ship they'll get distinct UAs so you can scope policy independently.

Purpose

User-agent string

Frequency

Status

Audit (on-demand)

ProtalBot/1.0 (+https://protal.ai/bot)

On request

Operational

Continuous monitoring

(reserved — not yet shipping)

—

Planned

Replay as named AI UA

(reserved — not yet shipping)

—

Planned

IP ranges

Region pinned, ranges TBD.

All ProtalBot traffic egresses from a single region (us-east-1) for reproducibility — same site scanned twice gets the same latency profile. The exact CIDR list is finalized before public launch.

us-east-1

(published before launch)

Vercel egress · Virginia

us-west

—

Not active

eu-west

—

Not active

ap-south

—

Not active

Canonical list will land at https://protal.ai/.well-known/protalbot.json · auto-updated when egress changes.

robots.txt recipes

Three configurations, copy-paste.

Drop one of these into your /robots.txt. ProtalBot re-fetches robots at the start of every scan.

Allow all

Welcome ProtalBot

Default for most sites — let Protal audit when scanned by you or someone you've shared the URL with.

# Allow Protal's auditor
User-agent: ProtalBot
Allow: /
Sitemap: https://example.com/sitemap.xml

Rate-limit

Slow it down

Useful if your origin is sensitive. ProtalBot already self-throttles — this just makes it more conservative.

# Audit OK, but pace yourself
User-agent: ProtalBot
Allow: /
Crawl-delay: 5

Block entirely

Opt out

Protal honors this immediately — audits will report "blocked" and stop probing.

# No thanks
User-agent: ProtalBot
Disallow: /

Crawl behavior

How politely it fetches.

ProtalBot is designed to be invisible in your logs. If it isn't, email us — we treat runaway audits as bugs.

1×

Per-host concurrency

One concurrent scan per target domain — never parallelize a site against itself. Multiple users requesting the same site share a 24h cache.

10/h

Hourly cap

Maximum 10 scans per target host per hour, globally across all requesters. Hot targets (github.com, stripe.com) extend to a 7-day cache.

24h

Cache lifetime

Rule results cached 24 hours unless the user explicitly re-runs. robots.txt is re-fetched at the start of every scan.

No JS execution

ProtalBot is mostly a pure fetcher. Category 6 (Rendering) does invoke a headless Chromium to test what's missing for non-JS crawlers, but the static path stays static.

FAQ

The usual questions.

Is ProtalBot used to train models?

No. Protal does not sell, license, or use fetched content to train language models. Responses are analyzed for audit rules only, then discarded within 24 hours.

What if I block ProtalBot?

We immediately stop probing your site. The next scan request reports a blocked status with the matching Disallow as evidence.

Does it hit /admin or /api?

Only if they're linked from pages an AI agent would crawl, and only if robots.txt permits. You can always carve them out with a scoped Disallow.

How do I contact you about traffic?

Email bot@protal.ai with the user-agent and a timestamp — we respond within five business days and can pause audits mid-flight.

Can I whitelist by IP?

Soon. The published IP-range list is being finalized for production — see the IP ranges section below. Reverse-DNS verification is the spoof-resistant path once that's up.

Does ProtalBot follow nofollow?

Yes. Link-level rel="nofollow" and meta robotsdirectives are respected per the standard — we audit, we don't index a link graph.