Back to blog
Field Notes··10 min read

A Field Guide to AI Traffic

AI traffic is now one of the largest categories in many sites' logs — crawlers training models, readers fetching answers, and a new kind of agent that clicks and acts. A tour of who's visiting, what they want, and how anyone can tell them apart.

Protal Research·Updated May 31, 2026

Something changed about web traffic in the last year. Alongside the humans clicking through your pages and the search engines indexing them, a whole spectrum of AI-driven visitors now reads, fetches, and acts on your site — and in many logs it has quietly become one of the largest categories of all.

Most of this traffic never shows up in standard analytics, which were built to count people. This is a short field guide to it: who these visitors are, what they want, and roughly how anyone can tell them apart. It isn't a deep technical manual — it's a tour of the landscape, with just enough detail to make the picture click. The field is also moving fast, so the specifics here are a snapshot, and the names and tools will keep shifting. The useful thing to take away isn't a fixed list but a way of thinking about it.


Two families: readers and agents

It helps to split AI traffic by the one thing that actually matters — what it does with your site.

Most of it just reads. A crawler or fetcher pulls your content and leaves: it requests a page, takes the raw HTML, and moves on. It never clicks, scrolls, or fills anything in. By a wide margin, this is the bulk of AI traffic.

A smaller but fast-growing slice acts. An agent navigates, clicks, fills forms, completes tasks — software working through your site on a person's behalf, much the way a human would.

The line is about behavior, not about who set it off. When you ask an assistant like ChatGPT or Claude to "go read this page for me," it sends out a one-shot fetcher (ChatGPT-User, Claude-User) — and because you triggered it, it feels agent-like. But mechanically it's still just a fetch of the raw page; it doesn't browse. So the useful rule is to classify by how it behaves, not by who set it off. A live page-fetch is a reader; something that actually works through your site is an agent. (Same words, different thing under the hood: "ask Claude to read a page" is a reader-style fetch, while "ask Claude Code to read a page" runs locally and behaves like an agent.)

With that line drawn, here's each family in turn.

The readers: who's fetching, and why

The reading half is where the volume is, and it sorts cleanly by motive. Grouping it this way is handy because it's also how you'd act on it — you'd treat a bot bulk-collecting training data differently from a customer's assistant looking one thing up.

Training. Bots like GPTBot (OpenAI) and ClaudeBot (Anthropic) that pull content to train future models. Broad, repeated, and indifferent to any single visit.

Retrieval. Search-style bots like OAI-SearchBot and PerplexityBot that fetch your pages so an AI product can cite them in an answer. When someone asks an assistant for "the best tool for X," these are what decide whether you're even in the running.

User-triggered reads. A person asks their AI to look at a specific page and it fetches it live (ChatGPT-User, Claude-User). One person, one page, right now — not a sweep.

Discovery. Bots reading the AI-specific files you publish for them — like /llms.txt — to learn what your site offers without crawling every page to find out.

The useful lens across all of these is the company behind the traffic, not the individual bot. OpenAI, Anthropic, Google, and Perplexity each show up in several of these categories at once — one company training, retrieving, and answering — so "which companies are reading me, and for what" is the question that actually tells you something.

The agents: who's acting

The acting half is smaller, newer, and harder to see. There's no official taxonomy and the categories blur at the edges, but most agents you'll meet interact in one of a few ways — and, handily, how they interact is also what makes them recognizable.

Browser agents

These drive a real browser — rendering pages and clicking around much as a person would. This is the category most people picture: consumer products like ChatGPT Atlas and Perplexity Comet, or an AI assistant living in a Chrome extension. The wrinkle is that, from the outside, they look almost exactly like a human using Chrome — because under the hood, they are Chrome. That's what makes them the trickiest kind to spot.

Programmatic agents

These are driven by code rather than a consumer app — built on automation tools like Playwright or Puppeteer, sometimes wrapped in an AI reasoning layer, sometimes assembled by an individual and run from their own machine. They range from polished cloud services to a script someone hacked together over a weekend. As a group, they're defined by being scripted — software talking to your server directly, rather than a person (or an agent) browsing.

MCP agents

The newest and most different kind. Instead of loading and clicking your pages, these talk to a structured interface built for machines — protocols like MCP (the Model Context Protocol) and its web counterpart, or files like llms.txt. They skip the visual page entirely and go straight for the data or the actions. This is an early but fast-rising mode, and a good example of why the list of "kinds of agent" keeps growing: a year ago this barely existed.

Worth noting: these categories aren't walls. A capable agent often mixes modes — using a structured interface when one exists and falling back to driving a browser when it doesn't — so the same agent can show up looking like more than one kind.

So how can you tell?

The short version: each kind of AI visitor leaves its traces in a different place, which is exactly why grouping them by behavior is useful. You look where that kind actually shows up.

KindWhat it isWhere its evidence livesThe tell
Crawlers & readersTraining, retrieval, user-triggered, and discovery fetchers (GPTBot, OAI-SearchBot, ChatGPT-User…)Server, the moment the request arrivesA known bot User-Agent — or, increasingly, a signed request that says exactly who it is
Browser agentsAtlas, Comet, Chrome extensions — drive a real browserClient-side, mid-visitAutomation flags, header mismatches, the rhythm of clicks and typing
ProgrammaticPlaywright / Puppeteer scripts, or self-builtServer, the moment the request arrivesDatacenter IP, odd User-Agent, missing the headers a real browser always sends
MCP agentsTalk to a structured interface (MCP, llms.txt)Your interface logs onlyInvisible unless you offer such an interface

For readers and programmatic agents, the clues arrive with the request itself, before any page even loads — a self-describing label that names a known bot, an unusual "who am I" string, missing the little headers a real browser always sends, or an address that belongs to a data-center rather than a home internet connection. A server can glance at all this the moment a request comes in. As a rough illustration of the idea — not anything elaborate — it's a bit like this:

// A simplified sketch of the idea, not a real implementation
function classifyRequest(request) {
  const id = request.headers["user-agent"] || "";

  if (id.includes("GPTBot") || id.includes("ClaudeBot")) {
    return "a known crawler — it identifies itself";
  }
  if (id.includes("python-requests") || id.includes("Headless")) {
    return "looks programmatic";       // not a normal browser
  }
  if (!request.headers["sec-fetch-mode"]) {
    return "missing the usual browser headers";
  }
  return "looks like an ordinary visitor — need a closer look";
}

The point of the snippet isn't the code — it's the intuition: a lot can be guessed just from how a request introduces itself.

For browser agents, that early glance isn't enough, because they introduce themselves exactly like a real browser would. Here you have to watch how the visit behaves once it's underway — telltale automation flags left behind by the tools steering the browser, whether a form gets filled out impossibly fast or evenly, the rhythm of clicks and typing. None of these is a smoking gun on its own; it's the combination that paints a picture. This is the genuinely hard case, and it's an ongoing back-and-forth: as agents get better at acting human, the tells get subtler.

For MCP agents, there's nothing to watch on the page at all — they never load it. They only show up in the logs of the structured interface they're talking to. If you don't offer such an interface, you simply won't see them; if you do, that's the place to look.

There's also a cleaner path emerging that sidesteps all the guesswork: some AI traffic is starting to announce itself honestly using cryptographic signatures (an effort called Web Bot Auth, now moving through standards bodies). When a request is signed this way, a site can verify exactly who sent it — no detective work required. It's early, and only cooperative senders participate, but it points at where things may be heading: a web where well-behaved AI simply identifies itself, and the harder detective work is reserved for the rest.

Why any of this matters: the agent economy is arriving fast

It's tempting to file all this under "interesting but niche." The numbers say otherwise — this is one of the fastest-moving shifts the web has seen.

In its 2026 benchmark report, the security firm HUMAN analyzed more than a quadrillion interactions and found that traffic from AI agents and agentic browsers grew 7,851% year over year — while automated traffic overall is now growing roughly eight times faster than human traffic.1 Cloudflare, which handles traffic for about a fifth of all websites, expects bots to outnumber humans on the internet by 2027; its CEO frames the mechanism vividly — a person buying a camera might visit five sites, while an agent doing the same errand might visit five thousand.2 The multiplier is structural, and it isn't slowing down.

Behind those numbers is an emerging agent economy that the industry's leaders talk about in increasingly literal terms. OpenAI's Sam Altman describes companies treating autonomous agents like a team of "junior employees" — with human roles shifting toward assigning tasks and reviewing output.3 Anthropic's Dario Amodei has gone further, warning that agents could reshape a large share of white-collar entry-level work within a few years (a forecast he and Altman have both since tempered, a useful reminder of how uncertain the timeline really is).4 The framing of agents as a kind of labor force rather than a mere tool is now mainstream among the people building them.

And that framing is sprouting real infrastructure. Job marketplaces have appeared where agents are "hired" like freelancers — you describe a task, an agent quotes and delivers, and increasingly nobody on the other side is human.5 App-store-style directories (Anthropic's Claude Skills, OpenAI's GPT Store, MCP hubs) have become the primary way agents get discovered and distributed.6 There's even a budding training-and-tooling layer — a wave of Silicon Valley startups building the "environments" where agents are drilled on multi-step tasks, much like a training ground for new hires.7 And the human job market is reshaping around all this too, with postings mentioning agentic-AI skills jumping nearly 1,000% in a single year (2023–2024); the World Economic Forum has called trust the foundation of this whole emerging economy.8 In other words: AI isn't just visiting websites; it's becoming an economic actor with its own marketplaces, credentials, and hiring dynamics.

Which brings it back to your own site. You don't need to act on every AI visit — most are benign, and a growing number are doing useful things for real people, like researching a product or completing a purchase. But being able to see them matters, starting with a mundane but real problem: standard analytics tools quietly count browser agents as human visitors, because they can't tell the difference, and ignore most crawlers entirely. If a slice of your "traffic" is actually AI reading and acting, your numbers — conversion rates, bounce rates, where visitors come from — are subtly off, and you may be drawing the wrong conclusions about real people.

More broadly, the AI web is still taking shape. The categories will shift, new interaction modes will appear, and today's clever detection trick will be tomorrow's footnote. The lasting takeaway is simpler: a real and fast-growing slice of your visitors is software, not people — crawlers training and answering, agents clicking and acting — and the sites that can see that clearly are the ones in a position to decide what to do about it.

References

Footnotes

  1. HUMAN Security, 2026 State of AI Traffic & Cyberthreat Benchmark Report (March 26, 2026). Reports that AI agent and agentic-browser traffic grew 7,851% year over year and that automated traffic is growing roughly eight times faster than human traffic. https://www.humansecurity.com/learn/blog/ai-traffic-growth-2025-key-findings/

  2. Matthew Prince (Cloudflare CEO), remarks at SXSW, March 2026, as reported by TechCrunch and others; Cloudflare serves roughly 20% of all websites. The "five sites vs. five thousand" framing and the 2027 bots-exceed-humans projection are his. See coverage at https://finance.biggo.com/news/BP7xLp0BJouf4oEh3dpn

  3. Sam Altman (OpenAI CEO), on companies treating AI agents as "junior employees," June 2025, as reported by MarketBeat. https://www.marketbeat.com/articles/openai-ceo-sam-altman-says-ai-agents-are-like-a-team-of-junior-employees-2025-06-03

  4. Dario Amodei (Anthropic CEO), interview with Axios (mid-2025), warning AI could eliminate up to half of entry-level white-collar jobs within one to five years. Both Amodei and Altman later tempered these forecasts — see Fortune, May 2026. https://fortune.com/2026/05/26/sam-altman-dario-amodei-walking-back-ai-jobs-apocalypse-prophecies-ipo/

  5. Example: Moltlaunch, an agent-hiring marketplace launched on Base, Feb 9, 2026, where tasks are quoted and delivered by AI agents rather than humans. https://www.mexc.com/news/744267

  6. digitalapplied, AI Agent Marketplaces 2026: Discovery and Distribution (April 2026), on Claude Skills, the GPT Store, and MCP hubs as primary agent distribution surfaces. https://www.digitalapplied.com/blog/ai-agent-marketplaces-2026-discovery-distribution

  7. TechCrunch, Silicon Valley bets big on 'environments' to train AI agents (Sept 21, 2025), on the wave of startups building reinforcement-learning environments that simulate workspaces for training agents on multi-step tasks. https://techcrunch.com/2025/09/21/silicon-valley-bets-big-on-environments-to-train-ai-agents/

  8. The Interview Guys, Top 10 Agentic AI Jobs in 2026, citing a 986% jump in job postings mentioning agentic-AI skills between 2023 and 2024, and referencing the World Economic Forum's framing of trust as foundational to the agent economy. https://blog.theinterviewguys.com/top-10-agentic-ai-jobs/

Want to know if your robots.txt is configured correctly?

Run a free Protal Audit.

Reports include your full robots.txt analysis, schema.org structured-data validation, llms.txt presence, MCP discovery, and a battery of other AI-readiness checks across 9 categories. Every rule is documented in our public methodology.