robots.txt generator
Choose allow all, block all or build custom rules and see the file instantly. Copy or download it and publish it at your site's root. No sign up, right in your browser.
User-agent: * Disallow:
Everything about robots.txt
The robots.txt generator builds the right file for your site in seconds. Choose allow all, block all or create custom rules, copy the result and publish it at your domain root. Below is a full guide to the syntax, the mistakes that hurt SEO and how to handle AI bots.
What robots.txt is and where it lives
robots.txt is a plain text file that tells search bots which parts of your site they may crawl. It is the first thing Googlebot reads when it reaches your domain. Its rules guide crawling, but they do not replace a password: sensitive content should always stay behind authentication.
The file must live at the domain root and open at https://yoursite.com/robots.txt. The name is always robots.txt, all lowercase. Each subdomain and each protocol has its own file, so blog.yoursite.com uses a separate robots.txt from yoursite.com.
robots.txt syntax and directives
A robots.txt file is made of blocks. Each block starts with the User-agent directive, which sets the bot the rules apply to, followed by one or more Disallow and Allow lines that close or open paths. The Sitemap line can appear anywhere and points to your site map.
| Directive | What it does | Example |
|---|---|---|
| User-agent | Sets the bot that follows the block of rules. The asterisk applies to all. | User-agent: * |
| Disallow | Blocks crawling of a path or folder. | Disallow: /admin/ |
| Allow | Opens a path inside a blocked folder. | Allow: /admin/public/ |
| Sitemap | Points to the full site map URL. | Sitemap: https://yoursite.com/sitemap.xml |
| * (wildcard) | Stands for any sequence of characters in a path. | Disallow: /*?color= |
| $ (end of URL) | Marks the exact end of the URL. | Disallow: /*.pdf$ |
| Disallow: / | Blocks the entire site at once. | Disallow: / |
The * and $ wildcards
The asterisk (*) stands for any sequence of characters, so Disallow: /*.pdf$ blocks every PDF on the site. The dollar sign ($) marks the end of the URL and keeps you from blocking similar addresses by mistake. Both work on Google and Bing, but not every bot understands wildcards, so use them with care.
Where to place the file
Save the result as robots.txt and upload it to the main folder of your server, so it opens at https://yoursite.com/robots.txt. To test it, just type that address in your browser. Google also offers a robots.txt report in Search Console that shows the latest version it read and flags errors.
Blocking crawling vs blocking indexing
This is the most common SEO mistake with robots.txt. Blocking a page in robots.txt stops crawling, but it does not remove the page from Google's index. If other sites link to it, the address can still appear in search with no title or description. To take a page out of results, let it be crawled and use the <meta name='robots' content='noindex'> tag in the HTML. robots.txt controls who gets in; noindex controls what stays in the index.
Common mistakes that hurt SEO
- Blocking the whole site by accident: a leftover Disallow: / line from staging stops Google from crawling any page.
- Blocking CSS and JavaScript: Google needs those files to render the page. Closing /assets/ or /js/ can make it see a broken layout.
- Using robots.txt to hide content: blocked pages still show up in search if they have links. The noindex tag exists for that.
- Getting case wrong: paths are case sensitive, so /Admin and /admin are treated as different addresses.
- Forgetting the Sitemap line: adding the Sitemap helps bots find every URL you want indexed.
How to control AI bots
AI bots respect robots.txt too. You decide whether your content feeds models like ChatGPT, Claude and Gemini or shows up in their answers. Each company uses its own User-agent, and you block or allow each one just like any other bot.
| Bot | Company | What it does |
|---|---|---|
| GPTBot | OpenAI | Collects pages to train the ChatGPT models. |
| OAI-SearchBot | OpenAI | Indexes content that can appear in ChatGPT search. |
| ChatGPT-User | OpenAI | Visits a page when a user asks for it in ChatGPT. |
| ClaudeBot | Anthropic | Collects content to train Claude. |
| Google-Extended | Controls use of your content in Gemini and Google AI. | |
| PerplexityBot | Perplexity | Indexes pages to answer inside Perplexity. |
| CCBot | Common Crawl | Public dataset used by many AI models. |
To block any of them, write a block with the bot's User-agent and a Disallow: / line. For example, User-agent: GPTBot followed by Disallow: / asks OpenAI's bot not to collect any page. Keep in mind this is a request: well-behaved bots obey it, but the block does not work as a technical barrier.
Common questions about robots.txt
Is the robots.txt generator free?
Yes. It is 100% free, no sign up and no usage limit. The file is built in your browser.
Where do I put the generated file?
At the domain root, so it opens at https://yoursite.com/robots.txt. Just save it as robots.txt and upload it to your server.
How do I block the entire site?
Use the Block all mode, which generates User-agent: * and Disallow: /. This asks bots not to crawl any page.
Do I need a robots.txt for my site?
It is not required. Without one, bots crawl everything. It helps when you want to block specific areas or point to your sitemap.
Does robots.txt hide a page from search?
Not reliably. It blocks crawling, but the page can still appear if there are links. To remove it, use the noindex meta tag.
How do I stop AI bots like GPTBot from using my content?
Add a block with the bot's User-agent (for example GPTBot) followed by a Disallow: / line. The main AI bots, such as GPTBot, ClaudeBot and Google-Extended, respect robots.txt.
Want a blog that writes and optimizes itself?
Automarticles builds your full blog, with content optimized for Google and to be cited by AIs like ChatGPT. With no manual work.