✨ Get 25% OFFon any plan. Use the coupon:

PerplexityBot: what the Perplexity crawler is and how to control site crawling

By Tiago CostaUpdated on July 2, 2026

Illustration of a robot with a magnifying glass reading pages and building an answer with numbered citations, representing PerplexityBot.
Definition

PerplexityBot is Perplexity's crawler, the AI search engine that answers questions while citing sources. In practice, PerplexityBot:

  • visits public pages and indexes them for lookup;
  • helps Perplexity build answers with links to the origins;
  • identifies itself with the PerplexityBot user-agent in server logs;
  • should respect robots.txt, which is where you allow or block access.

What PerplexityBot is

PerplexityBot is the automated crawler run by Perplexity, a tool that mixes search and artificial intelligence to answer questions in natural language, always with links to the sources. For that cited answer to exist, Perplexity first has to know the content of the web, and that is where the bot comes in.

Like any crawler, PerplexityBot travels public pages, reads the text and stores it in an index. The difference from a pure training crawler lies in the use: the material serves for Perplexity to find and cite current information when answering, not only to train a model once. That is why PerplexityBot values fresh, well structured content.

For anyone who publishes on the web, this changes the logic of the decision. Blocking PerplexityBot protects the content, but it also removes your site from the list of sources Perplexity can cite, with links that bring visits back.

PerplexityBot and Perplexity-User: two agents, two purposes

A detail that confuses many people is that Perplexity runs more than one agent, and each behaves differently. Understanding the difference is essential to write rules that do what you expect:

  • PerplexityBot: the crawler that indexes the web systematically to feed the search engine's index. This is the one you control in robots.txt.
  • Perplexity-User: triggers a visit to a specific page when a user asks a question that requires checking that address in real time. Because it acts on a person's request, Perplexity treats this access differently from mass crawling.

This distinction has practical consequences. A rule that blocks PerplexityBot may not affect the agent that fetches on the user's request, which is often the source of misunderstandings about blocks that seem not to work.

Infographic of PerplexityBot's cycle: crawl, index, question, answer with sources and click back to the original page.
How PerplexityBot becomes an answer: from indexing the page to a citation that links back to the source.

What is PerplexityBot's user-agent

In your server logs, Perplexity's crawler appears with a user-agent that contains the word PerplexityBot, in a format similar to PerplexityBot/1.0 accompanied by a Perplexity contact address. The user-triggered agent appears with the Perplexity-User identifier.

Knowing how to read this identifier is the first step to monitor how much Perplexity crawls your site and to confirm whether a hit is really from it. Remember that the user-agent is just text declared by the visitor itself, so it can be copied. Solid confirmation comes from cross-checking the name with the official IP ranges and with the access behavior, not just from the line that shows up in the log.

How to allow or block PerplexityBot in robots.txt

The main control point is the robots.txt file at the root of the site. To block PerplexityBot from crawling, use:

  • User-agent: PerplexityBot
  • Disallow: /

To allow it, simply do not block, or use Allow: /. If you also want to stop the user-triggered agent, you need a specific rule for Perplexity-User, aware that Perplexity argues that searches made on a person's request work like a browser acting on their behalf.

Here lies an important warning: robots.txt relies on the bot's goodwill. And in Perplexity's case, that goodwill was called into question, as the next section shows. For content you truly need to protect, robots.txt alone may not be enough.

The controversy over Perplexity's stealth crawling

Not all of Perplexity's crawling happened in the open. In 2025, Cloudflare published an investigation claiming that, when Perplexity's declared bots hit blocks, the company resorted to undeclared crawlers that disguised themselves as an ordinary Chrome browser to access content from sites that had asked not to be crawled. According to Cloudflare, this behavior was observed across tens of thousands of domains and reached millions of requests per day.

Cloudflare reported having created new, undisclosed domains, configured to deny access to all bots, and even so Perplexity was said to have managed to retrieve and display the content of these test sites. In response, Perplexity disputed the accusation, claiming that part of the traffic attributed to it came from a third-party service and that its user-requested searches act like a browser, not like a training scraper.

Regardless of how the debate ends, the lesson for site owners is clear: robots.txt is a guideline, not a physical barrier. If the goal is to actually block access, not merely signal a preference, you need technical backup at the server level or from an application firewall.

Illustration of a robot disguised as a browser slipping past a no entry gate and ignoring a robots.txt, representing the stealth crawling attributed to Perplexity.

PerplexityBot and GEO: becoming a cited source

From a GEO (Generative Engine Optimization) standpoint, Perplexity is one of the most interesting targets, precisely because it cites and links the sources of its answers. Each citation is a real chance to appear to the user and to receive a click back, something not every AI assistant offers.

To be a candidate for this kind of AI citation, the path starts by allowing PerplexityBot and following the content best practices for answer engines: answer the question directly at the start, back up claims with data and sources, and organize the text into blocks that are easy to extract. Current, specific content tends to be preferred, since Perplexity focuses on answering with recent information.

As a complementary signal, the llms.txt file is being adopted to indicate to models which content on the site to prioritize. It forces nothing, but it helps communicate organization and intent to those who want to be well represented in AI answers, rather than simply disappearing from them.

FAQ

Frequently asked questions

What is PerplexityBot?

PerplexityBot is Perplexity's crawler, the AI search engine that answers questions while citing sources. It indexes public pages to feed the index Perplexity queries when building answers. It identifies itself with the PerplexityBot user-agent and should respect robots.txt.

How do I block PerplexityBot?

In robots.txt, use User-agent: PerplexityBot and Disallow: / to stop the crawling. For the user-triggered agent, add a rule for Perplexity-User. Since there have been reports of stealth crawling, sensitive content calls for backup at the server level or from a firewall.

Is Perplexity free?

Perplexity has a free version with basic search and answer features, plus a paid plan with more advanced models and higher limits. It helps to distinguish: Perplexity is the product you use; PerplexityBot is the robot that crawls the web to feed that product.

Which is better, ChatGPT or Perplexity?

It depends on the use. Perplexity focuses on answering questions with cited sources and links, which helps you fact-check. ChatGPT is a broader assistant, with strong conversation and writing ability. For research with traceable references, Perplexity tends to please.

How much does Perplexity Pro cost?

Perplexity Pro is Perplexity's paid plan, priced around 20 US dollars a month (or an equivalent discounted yearly fee). There have been promotions offering free access for a period through partnerships, but prices change, so always check the official pricing.

Be the source that AI cites and links

Automarticles writes and optimizes your blog articles on its own, with objective answers and clear sources that raise the chance of being cited by AI search engines like Perplexity.

Start free trial
Keep learning

Related concepts

ClaudeBotClaudeBot is the crawler operated by Anthropic, the company behind the Claude AI assistant. It travels the public web to collect content that helps train and inform the Claude models. Just as Googlebot does for search, ClaudeBot identifies itself with its own user-agent, respects the robots.txt file and can be allowed or blocked by any site. Deciding what to do with it has become part of the strategy for anyone who does, or does not, want to appear in AI answers.OAI-SearchBotOAI-SearchBot is the crawler OpenAI uses to feed ChatGPT search, that is, to discover and index pages that can become a cited source in real time search answers. It is different from GPTBot, which collects content to train the models, and from ChatGPT-User, which acts when the user requests an action. Understanding this split is what lets you appear in ChatGPT search without necessarily allowing your content to be used for training.Answer engineAn answer engine is any search system that returns a direct, already synthesized answer instead of a list of blue links. Rather than making the person click through several results, it reads multiple sources, summarizes them and delivers the ready answer right there. This category includes Google's AI Overviews, AI assistants such as ChatGPT, Perplexity and Gemini, voice assistants and even traditional featured snippets. It is the shift that makes SEO evolve toward becoming a cited source, not just a clicked link.Robots.txtRobots.txt is a plain text file, saved in the root of a domain, that tells search engine crawlers which parts of a site they can or cannot crawl. It follows the Robots Exclusion Protocol and controls crawling, not indexing, so it is not the right tool to hide a page from search results.