Generate Your Robots.txt File

Control how search engines crawl your website by creating a custom robots.txt file. Select presets or build your own rules.

What is robots.txt?
The robots.txt file tells search engine crawlers which pages or files they can or can't request from your site. It's placed in the root directory of your website (e.g., https://example.com/robots.txt).

Configuration

Quick Presets
User Agents
All Bots
Googlebot
Bingbot
Yahoo
DuckDuckGo
Baidu
Yandex
Rules
Additional Settings

Preview

# Your robots.txt will appear here

Frequently Asked Questions

If your website doesn't have a robots.txt file, search engines will assume they have permission to crawl all publicly accessible pages. This is equivalent to having a robots.txt file that allows everything:

User-agent: *
Disallow:

While this isn't necessarily bad, having a robots.txt file gives you more control over how search engines interact with your site, and can help prevent them from wasting resources crawling unimportant pages.

No, robots.txt is not legally binding and relies on voluntary compliance.

Well-behaved search engines (Google, Bing, Yahoo, etc.) will respect your robots.txt directives. However, malicious bots, scrapers, and hackers often ignore robots.txt files completely. Think of robots.txt as a "polite request" rather than a security measure.

Important: Never use robots.txt to hide sensitive information. Malicious actors can read your robots.txt file to find pages you don't want crawled, making it a roadmap to your sensitive content.

Yes, there are a few potential downsides:

  • Public visibility: Your robots.txt file is publicly accessible at yoursite.com/robots.txt. Anyone can see which paths you're blocking, potentially revealing the structure of your site or locations of admin areas.
  • Not a security tool: Blocking a path in robots.txt doesn't prevent people from accessing it directly. It only asks search engines not to crawl it.
  • Accidental blocking: A misconfigured robots.txt can accidentally block important pages from search engines, hurting your SEO.
  • Pages can still be indexed: If other sites link to a blocked page, search engines might still index it (though they won't crawl its content).

Your robots.txt file must be placed in the root directory of your website and must be named exactly "robots.txt" (lowercase).

For example:

  • ✅ Correct: https://example.com/robots.txt
  • ❌ Incorrect: https://example.com/pages/robots.txt
  • ❌ Incorrect: https://example.com/Robots.txt
  • ❌ Incorrect: https://example.com/robots.TXT

Each subdomain needs its own robots.txt file if you want different rules (e.g., blog.example.com/robots.txt is separate from example.com/robots.txt).

Disallow: Tells crawlers NOT to access a specific path.

Disallow: /admin/

Allow: Explicitly permits crawlers to access a path (useful for creating exceptions to broader Disallow rules).

Disallow: /admin/
Allow: /admin/public/

In the example above, all of /admin/ is blocked except /admin/public/ which is explicitly allowed.

Absolutely not! This is a critical security mistake.

Using robots.txt to block sensitive areas has several problems:

  • The robots.txt file is publicly readable, so you're advertising where your sensitive content is located
  • Malicious bots don't respect robots.txt and will specifically target blocked areas
  • Direct links to blocked pages still work - robots.txt doesn't prevent access

Correct approach: Use proper authentication, passwords, server configuration, or firewalls to protect sensitive content.

A user-agent is the name/identifier of the bot or crawler. Different search engines use different user-agent names:

  • Googlebot - Google's web crawler
  • Bingbot - Microsoft Bing's crawler
  • Slurp - Yahoo's crawler
  • DuckDuckBot - DuckDuckGo's crawler
  • * (asterisk) - Matches all bots

You can specify different rules for different crawlers. For example, you might want to slow down a particularly aggressive crawler with a crawl-delay while allowing others full access.

Crawl-delay specifies the number of seconds a crawler should wait between requests to your server. For example:

Crawl-delay: 10

This tells the bot to wait 10 seconds between each page request.

When to use it:

  • Your server is struggling with bot traffic
  • You want to reduce server load from crawlers
  • A specific bot is being too aggressive

Note: Google ignores Crawl-delay. Instead, use Google Search Console to adjust Googlebot's crawl rate. Bing and other search engines do respect this directive.

Yes! Most modern crawlers support wildcard patterns:

  • * (asterisk) - Matches any sequence of characters
  • $ (dollar sign) - Matches the end of a URL

Examples:

# Block all URLs with "?" (query parameters)
Disallow: /*?

# Block all .pdf files
Disallow: /*.pdf$

# Block all URLs containing "admin"
Disallow: /*admin*/

Note: Wildcards are supported by Google, Bing, and most modern crawlers, but very old or simple bots may not understand them.

You can test your robots.txt file in several ways:

  1. Direct access: Visit yoursite.com/robots.txt in a browser to see if it loads correctly
  2. Google Search Console: Use the robots.txt Tester tool (under Legacy tools & reports)
  3. Online validators: Use free robots.txt testing tools to check syntax
  4. Check formatting: Make sure there are no extra spaces, special characters, or encoding issues

Common testing mistakes:

  • Using a text editor that adds hidden characters or smart quotes
  • Saving with the wrong encoding (use UTF-8)
  • Having a typo in the filename (must be exactly "robots.txt")

Yes, it's highly recommended! Adding your sitemap URL to robots.txt helps search engines discover and crawl your site more efficiently.

Sitemap: https://example.com/sitemap.xml

Benefits:

  • Helps search engines find all your important pages
  • Provides a central reference for your site structure
  • Can specify multiple sitemaps if needed
  • No downside - it only helps search engines

You can list multiple sitemaps:

Sitemap: https://example.com/sitemap.xml
Sitemap: https://example.com/sitemap-news.xml
Sitemap: https://example.com/sitemap-images.xml

Here are the most common and costly robots.txt mistakes:

  1. Blocking your entire site by accident
    # DON'T DO THIS (unless you really want to block everything)
    User-agent: *
    Disallow: /
  2. Blocking CSS and JavaScript - Google needs these to render your pages properly. Don't block /css/ or /js/ folders.
  3. Using it as a security measure - As mentioned, robots.txt is not for security.
  4. Incorrect file location - Must be in root directory, not in a subdirectory.
  5. Case sensitivity errors - The filename must be lowercase "robots.txt", but paths can be case-sensitive depending on your server.
  6. Forgetting to update after site changes - Review your robots.txt when you restructure your site.
  7. Using relative URLs - Sitemap URLs must be absolute (include full domain).