What does turning on a custom robots.txt mean?

Turning on a custom robots.txt means replacing the automatic file the platform generates with an editable version you create. In plugins like Yoast and Rank Math, this option opens an editor where you manually define which areas of the site to allow and which to block.

Where is the robots.txt file on WordPress?

It lives in the root of the domain, at yourdomain.com/robots.txt. On WordPress, the file is virtual by default (generated by the system). To edit it, use an SEO plugin or create a physical robots.txt file in the root folder of the installation via FTP.

What is the main function of the robots.txt file for SEO?

To tell crawlers which parts of the site they can or cannot access, sparing them from crawling irrelevant pages and helping steer the crawl budget toward important content. It controls crawling, not indexing.

What does Googlebot do?

Googlebot is Google's crawler robot. It browses the web following links, reads each site's robots.txt to learn what it may access, and sends the allowed pages to be processed and, when relevant, indexed in the search results.

Robots.txt: what it is and how to use the file on your site

By Tiago CostaUpdated on July 2, 2026

Illustration of a robot in front of a website gate holding a list, with allowed and blocked signs, representing the robots.txt file.

Definition

Robots.txt is a plain text file, placed at the root of a domain (at yourdomain.com/robots.txt), that guides search engine robots on which parts of the site they may crawl. It usually contains:

a User-agent line naming the target robot;
Disallow rules for paths that should not be crawled;
Allow rules that open exceptions;
a Sitemap line pointing to the site map.

What the robots.txt file is

Robots.txt is a simple text file, saved in the root of the domain (at yourdomain.com/robots.txt), that tells search engine robots which parts of the site they can or cannot crawl. It is the front door that every well behaved crawler checks before browsing the pages.

The file follows the Robots Exclusion Protocol (REP), a standard created in 1994 and adopted by virtually every search engine. The rules are public: anyone can open a site's robots.txt by typing the address in the browser.

It helps to understand its role right away: robots.txt controls crawling, not indexing. In other words, it says where the robot may go, but it is not the right tool to hide a page from search results.

How robots.txt works in practice

Whenever a search engine visits a site, the first thing the search engine does is look for the file at /robots.txt. If it exists, the robot reads the instructions and respects them; if it does not exist, the robot assumes it can crawl everything.

The rules are organized into blocks, each starting with a User-agent (the target robot) followed by Disallow and Allow lines. A typical example:

User-agent: * applies the rule to all robots;
Disallow: /admin/ asks them not to crawl the admin folder;
Allow: /admin/ajax.php opens an exception inside the blocked folder;
Sitemap: https://yourdomain.com/sitemap.xml points to the site map.

It is important to remember that these rules are guidance, not a lock. Legitimate robots like Googlebot obey them, but malicious robots can simply ignore the file.

Infographic of the anatomy of a robots.txt file showing the User-agent, Disallow, Allow and Sitemap directives stacked. — Anatomy of a robots.txt file: the directives stacked from top to bottom.

Syntax and main directives

Robots.txt accepts a small set of directives. Knowing each one avoids accidental blocks:

Directive	What it does
User-agent	Defines which robot the following rules apply to (use * for all).
Disallow	Marks a path the robot should not crawl.
Allow	Frees a specific path inside a blocked area.
Sitemap	Provides the URL of the XML sitemap to help page discovery.

There are technical limits to respect. According to the Google Search Central documentation, Googlebot processes only the first 500 kibibytes (about 512 KB) of a robots.txt file and ignores anything after that limit. Older directives such as Crawl-delay, by the way, are not interpreted by Google.

Robots.txt does not block indexing

This is the most expensive misunderstanding about robots.txt. Blocking a page with Disallow stops the robot from reading the content, but does not guarantee it stays out of Google. If other pages link to it, the search engine may index the URL even without crawling it, showing the result with no description.

Google Search Central's own documentation warns that robots.txt is not meant to keep a page out of search results. For that there is the noindex directive, applied on the page itself.

There is one more trap: if you block the page in robots.txt, Google cannot read the noindex tag inside it and the command is never obeyed. The practical rule to remove something from search is the opposite of the intuitive one: let the robot crawl and use noindex. Understanding the difference between crawling and indexing solves most of these cases.

Illustration of a page locked by a padlock that still appears in search results with no description, showing that Disallow is not the same as noindex.

Where robots.txt lives and how to create it on WordPress

Robots.txt always lives in the root of the domain, reachable at yourdomain.com/robots.txt. There is no robots.txt per subfolder: each domain (and subdomain) has its own.

On WordPress, the system generates an automatic virtual file when there is no physical one. To take control, you can:

use an SEO plugin like Yoast or Rank Math, which offers a robots.txt editor right in the dashboard;
create a physical file named robots.txt in the root folder of the installation, via FTP or a file manager;
turn on the plugin's custom robots.txt option, which replaces the virtual file with your own rules.

Turning on the custom robots.txt means exactly that: swapping the default file generated by the platform for an editable version, where you manually define what to allow and what to block.

Common mistakes and best practices

Because it is a small and powerful file, robots.txt causes damage when misconfigured. The most frequent slips:

Blocking the whole site: a Disallow: / left over after development takes every page out of crawling.
Blocking CSS and JavaScript: blocking these files hinders Google from rendering the page and can hurt the assessment.
Trusting robots.txt for privacy: since the file is public, listing sensitive folders in it ends up revealing the path.
Forgetting the sitemap: including the Sitemap line helps the search engine discover your URLs faster.

After any change, test the file in Google Search Console and treat robots.txt as part of your technical SEO routine. Good use also helps steer the crawl budget toward the pages that really matter.

Robots.txt: what it is and how to use the file on your site

What the robots.txt file is

How robots.txt works in practice

Syntax and main directives

Robots.txt does not block indexing

Where robots.txt lives and how to create it on WordPress

Common mistakes and best practices

Frequently asked questions

Leave the technical side of your blog on autopilot

Related concepts

Related tools