robots.txt checker
Paste a site URL and see its robots.txt analyzed: User-agent groups, blocking rules, Sitemaps and warnings. Right in your browser, no sign up.
How to read and check robots.txt
The robots.txt is a text file at the root of a site, at yoursite.com/robots.txt, that tells search crawlers which parts of the site they may crawl. A simple mistake in this file can take down crawling for the whole site, so it pays to check it now and then.
How to read robots.txt line by line
The file is organized into groups. Each group starts with a User-agent (the crawler the rules apply to) and lists directives below it. The most common are Disallow (a path the crawler should not crawl) and Allow (an exception that opens a path inside a blocked area).
- User-agent: * applies the rules to every crawler. You can also target a specific bot, such as
Googlebot. - Disallow: /admin/ asks crawlers not to crawl anything inside
/admin/. - Disallow: left empty opens the whole site for that group.
- Sitemap: points to the full URL of your sitemap to help the engine find your pages.
Main directives and what they do
| Directive | Example | What it does |
|---|---|---|
| User-agent | User-agent: * | Defines which crawler the group of rules applies to |
| Disallow | Disallow: /cart/ | Asks the crawler not to crawl that path |
| Allow | Allow: /blog/ | Opens a path inside a blocked area |
| Sitemap | Sitemap: https://yoursite.com/sitemap.xml | Points to where the site's sitemap lives |
Common robots.txt mistakes
| Mistake | Effect |
|---|---|
| Disallow: / for User-agent: * | Blocks the entire site from search engines |
| Missing robots.txt | Google crawls everything by default, with no guidance from you |
| No Sitemap line | You lose an easy hint that speeds up page discovery |
| Blocking CSS and JS | Google may render the page incompletely |
| Sitemap with a relative URL | The Sitemap line needs the full URL to be read |
Blocking crawling and blocking indexing
These are different things. Disallow in robots.txt asks the crawler not to crawl the page. To remove a page from Google's index, use the noindex meta tag in the page's own <head>. Important detail: a page blocked in robots.txt can still show up in search, because Google never reads its noindex. To remove it from the index, open crawling and use noindex.
robots.txt questions
Is the checker free?
Yes, free and no sign up. Paste the site URL and get the robots.txt analysis instantly.
Where is a site's robots.txt?
Always at the root of the domain, at yoursite.com/robots.txt. The tool derives the origin from the URL you paste and fetches that file automatically.
What does Disallow: / mean?
It is the rule that blocks the whole site for that group's crawler. When it appears for User-agent: *, no search crawler crawls the site, so the tool flags it in red.
Does robots.txt stop indexing on Google?
Not directly. It controls crawling. To remove a page from the index, use the noindex meta tag on the page and keep crawling open so Google can read it.
Do I need to declare the Sitemap in robots.txt?
It is not required, but it helps a lot. The Sitemap line tells the engine where your pages are and speeds up discovery. The tool warns you when it is missing.
Is my data stored?
The check runs on demand and the robots.txt content is not stored.
Want a blog born with the right technical SEO?
Automarticles builds your full blog, with robots.txt, sitemap, canonical and content optimized for Google and to be cited by AIs like ChatGPT. With no manual work.