Robots.txt Generator
Pricing
Pay per event
Robots.txt Generator
Generate valid robots.txt files from structured rules. Apply presets (block AI bots, SEO-friendly), add custom per-bot rules, sitemaps, and crawl-delay. Zero-proxy, instant output.
Pricing
Pay per event
Rating
0.0
(0)
Developer
Stas Persiianenko
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
Generate valid, production-ready robots.txt files from structured rules — no manual editing, no syntax errors. Define per-bot allow/disallow paths, crawl delays, sitemap URLs, and apply one-click presets like "Block AI Crawlers" or "SEO-Friendly". Zero-proxy, instant output.
Zero proxy. Zero scraping. Pure computation. This actor runs entirely on Apify compute — no bandwidth costs, no rate limits, no CAPTCHAs.
What does it do?
The Robots.txt Generator takes structured JSON rules and transforms them into a properly formatted robots.txt file that you can deploy directly to your website root. It:
- 🚀 Applies one-click presets: Block All, Allow All, Block AI Crawlers, SEO-Friendly
- 📋 Accepts custom per-bot rules (user-agent, allow paths, disallow paths, crawl-delay)
- 🗺️ Appends sitemap URLs and optional Host directive
- ✅ Validates rules for conflicts, duplicates, invalid paths, and missing wildcards
- 💾 Saves output to dataset (JSON) and key-value store (
robots.txt,text/plain) for direct download - ⚡ Runs in under 5 seconds — no network requests required
Who is it for?
🧑💻 Web developers and DevOps engineers
You manage multiple sites and need to generate consistent, validated robots.txt files without error-prone manual editing. Use this actor as part of your deployment pipeline — run it via API, download the output, and deploy automatically.
📊 SEO professionals and consultants
You're optimizing client sites and need to quickly generate robots.txt files that follow SEO best practices: allow search engine crawlers, block scrapers and AI training bots, and include sitemaps — all in one step.
🤖 AI/LLM product owners
You want to protect your content from AI training crawlers (GPTBot, CCBot, anthropic-ai, etc.) without blocking legitimate search engines. The "Block AI Crawlers" preset handles this automatically with an up-to-date list of known AI bots.
🏢 Enterprise teams managing many properties
You're maintaining a set of standard robots.txt policies across a portfolio of websites and need a programmatic, repeatable way to generate them from structured configuration.
Why use it?
| Problem | Without this actor | With Robots.txt Generator |
|---|---|---|
| Syntax errors | Easy to make by hand | Automatically formatted |
| Conflicting rules | Hard to spot manually | Validated with warnings |
| AI bot list | Research bots yourself | 16 AI crawlers pre-loaded |
| Sitemap inclusion | Remember to add manually | Just list URLs |
| Consistency | Copy-paste across sites | Programmatic, repeatable |
| Integration | Manual file upload | API-driven, downloadable |
How much does it cost to generate a robots.txt file?
This actor uses Pay-Per-Event pricing — you only pay for what you generate.
| Event | FREE tier | BRONZE | SILVER | GOLD | PLATINUM | DIAMOND |
|---|---|---|---|---|---|---|
| Actor start | $0.005 | $0.00475 | $0.00425 | $0.00375 | $0.003 | $0.0025 |
| Per robots.txt generated | $0.001 | $0.0009 | $0.0008 | $0.00065 | $0.0005 | $0.0004 |
Typical cost per run: $0.006 (start + 1 file on FREE tier).
💡 Free plan estimate: Apify's free tier gives you $5/month in credit. That's enough for ~800 robots.txt files per month at FREE tier pricing.
This actor is one of the most affordable on the Store — zero proxy costs means there's no bandwidth markup.
How to use it
Step 1: Choose a preset (optional)
Start with a preset that matches your use case:
- Allow all — open your site to all crawlers
- Block all — protect a staging or private site
- Block AI crawlers — block GPTBot, CCBot, anthropic-ai, and 13 other AI training bots
- SEO-friendly — allow search engines, block scrapers and AI bots, disallow admin areas
Step 2: Add custom rules (optional)
Append your own per-bot rules on top of the preset. Specify:
{"userAgent": "Googlebot","allow": ["/"],"disallow": ["/admin/", "/private/"],"crawlDelay": 1,"comment": "Googlebot: full access except admin"}
Step 3: Add sitemaps and host (optional)
{"sitemaps": ["https://example.com/sitemap.xml"],"host": "example.com"}
Step 4: Run and download
The actor saves your robots.txt to:
- Dataset — JSON record with content, warnings, and metadata
- Key-value store — raw
text/plainfile (key:robots.txt) for direct download
Input parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
preset | string | seo-friendly | Quick preset: none, allow-all, block-all, block-ai-crawlers, seo-friendly |
rules | array | [] | Custom user-agent rule blocks (user-agent, allow, disallow, crawl-delay, comment) |
sitemaps | array | [] | Sitemap URLs to append as Sitemap: directives |
host | string | "" | Optional Host: directive (used by Yandex) |
includeTimestamp | boolean | true | Add generation timestamp comment |
includeGeneratorComment | boolean | true | Add Apify generator comment |
validateRules | boolean | true | Check for conflicts, duplicates, invalid paths |
Rule block schema
Each item in the rules array:
{"userAgent": "Googlebot","allow": ["/public/"],"disallow": ["/admin/", "/login/"],"crawlDelay": 2,"comment": "Optional comment above this block"}
| Field | Type | Required | Description |
|---|---|---|---|
userAgent | string or string[] | Yes | Bot name(s), e.g. "*", "Googlebot", ["GPTBot", "CCBot"] |
allow | string[] | No | Paths the bot CAN access |
disallow | string[] | No | Paths the bot CANNOT access |
crawlDelay | number | No | Seconds between requests (not supported by Googlebot) |
comment | string | No | Comment placed above the block |
Output fields
| Field | Type | Description |
|---|---|---|
robotsTxt | string | The complete robots.txt file content |
warnings | string[] | Validation warnings (conflicts, duplicates, etc.) |
ruleBlockCount | number | Total User-agent blocks in the file |
sitemapCount | number | Total Sitemap: directives |
lineCount | number | Total lines in the file |
generatedAt | string | ISO 8601 timestamp |
preset | string | Preset that was applied |
success | boolean | Whether generation succeeded |
error | string | Error message (if failed) |
Example output
Input
{"preset": "block-ai-crawlers","sitemaps": ["https://mysite.com/sitemap.xml"],"includeTimestamp": true}
Output robots.txt
# robots.txt generated by Apify Robots.txt Generator# https://apify.com/automation-lab/robots-txt-generator# Generated: 2026-04-09T10:00:00.000Z# Preset: Block AI CrawlersUser-agent: GPTBotDisallow: /User-agent: ChatGPT-UserDisallow: /User-agent: CCBotDisallow: /User-agent: anthropic-aiDisallow: /# ... (13 more AI bots)# Allow all other botsUser-agent: *Allow: /Sitemap: https://mysite.com/sitemap.xml
Tips and best practices
🎯 Rule ordering matters
Bots use the first matching User-agent block. If you have both a specific bot rule (Googlebot) and a wildcard rule (*), Googlebot will use its specific block and ignore the wildcard. Add specific bot blocks BEFORE the wildcard block.
📌 The wildcard block is your default
Always include a User-agent: * block as your default rule. Without it, bots that aren't explicitly listed have no rules — this can cause unpredictable crawling behavior.
⏱️ Crawl-delay is not universal
Crawl-delay is supported by Bingbot, Yandex, and many lesser-known bots, but Googlebot ignores it. To slow down Googlebot, use Google Search Console's crawl rate settings instead.
🚫 Don't block CSS and JS from Googlebot
Blocking /css/, /js/, or /assets/ prevents Googlebot from rendering your pages correctly, which can hurt rankings. Use the SEO-friendly preset which leaves these open.
✅ Validate before deploying
The actor's built-in validator catches common mistakes: overlapping allow/disallow patterns, duplicate user-agents, paths not starting with /, and missing wildcard blocks.
🔄 Regenerate when your site structure changes
Update and re-run whenever you add admin panels, API endpoints, or new content sections that should be crawled differently.
Presets reference
Allow All
User-agent: *Allow: /
Use for: Public sites, CDNs, open documentation.
Block All
User-agent: *Disallow: /
Use for: Staging environments, private intranets, sites under construction.
Block AI Crawlers
Blocks all known AI training crawlers: GPTBot (OpenAI), CCBot (Common Crawl), anthropic-ai (Anthropic), Claude-Web, Google-Extended (Gemini training), cohere-ai, PerplexityBot, Bytespider (ByteDance), Diffbot, FacebookBot, ImagesiftBot, Omgilibot, YouBot, Applebot-Extended, ChatGPT-User, PetalBot.
All other bots (including Googlebot, Bingbot) are allowed. Use for: Publishers protecting training data rights, paywalled content sites.
SEO-Friendly
- Allows all search engine bots with
Allow: / - Blocks 10 bad/spam bots (AhrefsBot, SemrushBot, MJ12bot, etc.)
- Blocks 16 AI crawlers
- Disallows common admin paths (
/admin/,/wp-admin/,/login/,/cart/,/checkout/) - Applies
Crawl-delay: 1to all bots
Use for: Production websites, e-commerce sites, blogs, SaaS marketing pages.
Integrations
🔗 Zapier / Make automation
Trigger this actor on a schedule and automatically deploy the output to your web server:
- Set up a scheduled Apify run (or cron trigger)
- In Zapier/Make, trigger when the run completes
- Fetch the
robots.txtfrom the key-value store URL - Upload to your server via FTP, S3, or API
☁️ AWS S3 deployment pipeline
Use the Apify API to run this actor as part of your CI/CD pipeline, then upload the output directly to S3:
RUN_OUTPUT=$(apify call automation-lab/robots-txt-generator --input-file=robots-config.json --output=dataset)aws s3 cp robots.txt s3://mybucket/robots.txt --content-type text/plain
🌐 Cloudflare Workers / CDN edge
Generate your robots.txt via this actor and serve it directly from a KV store URL in Cloudflare Workers, enabling instant updates without redeployment.
📊 Spreadsheet-driven multi-site management
Store your robots.txt configuration in Google Sheets (one row per site). Use Make.com to read each row and trigger this actor, then deploy each generated file to the corresponding site.
API usage
Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('automation-lab/robots-txt-generator').call({preset: 'seo-friendly',rules: [{userAgent: 'Googlebot',allow: ['/'],disallow: ['/admin/'],},],sitemaps: ['https://example.com/sitemap.xml'],});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items[0].robotsTxt);
Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("automation-lab/robots-txt-generator").call(run_input={"preset": "block-ai-crawlers","sitemaps": ["https://example.com/sitemap.xml"],"validateRules": True,})items = client.dataset(run["defaultDatasetId"]).list_items().itemsprint(items[0]["robotsTxt"])
cURL
# Start the actorcurl -X POST "https://api.apify.com/v2/acts/automation-lab~robots-txt-generator/runs?token=YOUR_API_TOKEN" \-H "Content-Type: application/json" \-d '{"preset": "seo-friendly","sitemaps": ["https://example.com/sitemap.xml"]}'# Get results (replace DATASET_ID with run output)curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN"
MCP — use with Claude, Cursor, and VS Code
You can use this actor as an MCP tool in Claude Desktop, Claude Code, Cursor, and VS Code to generate robots.txt files directly from AI chat.
Claude Code (terminal)
$claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/robots-txt-generator"
Claude Desktop / Cursor / VS Code
Add to your MCP config file:
{"mcpServers": {"apify": {"type": "http","url": "https://mcp.apify.com?tools=automation-lab/robots-txt-generator","headers": {"Authorization": "Bearer YOUR_APIFY_TOKEN"}}}}
Example prompts
"Generate a robots.txt that blocks GPTBot and CCBot but allows Googlebot and Bingbot. Add my sitemap at https://example.com/sitemap.xml."
"Create a robots.txt using the SEO-friendly preset for a WordPress site, but also disallow /wp-content/uploads/."
"Generate a robots.txt that blocks all bots from /staging/ and /draft/ paths, but allows Googlebot everywhere else."
Legal and compliance
Robots.txt is a voluntary protocol (the Robots Exclusion Standard). Compliant bots like Googlebot and Bingbot honor these directives. However, some scrapers and AI crawlers may ignore your robots.txt.
For legally binding protection against crawling, you must also:
- Add Terms of Service that explicitly prohibit scraping
- Implement technical access controls (rate limiting, authentication)
- Consider emerging AI scraping liability developments in your jurisdiction
This actor generates robots.txt files that comply with the RFC 9309 specification. The SEO-friendly and block-AI-crawlers presets include directives targeting bots known to the developer as of April 2026 — the list may not be exhaustive.
FAQ
Q: What's the difference between Allow and Disallow?
Disallow tells bots not to crawl a path. Allow explicitly permits access to a path that would otherwise be blocked by a broader Disallow rule. When paths conflict, the more specific (longer) rule wins in Googlebot — other bots may vary.
Q: Does Googlebot respect Crawl-delay?
No. Googlebot ignores the Crawl-delay directive. To control Googlebot's crawl rate, use Google Search Console's crawl rate settings. Other bots (Bingbot, Yandex) do respect Crawl-delay.
Q: How do I block a bot from only some pages?
Create a rule block for that specific bot with Disallow: paths, BEFORE the User-agent: * block. Bots use the first matching rule block.
Q: Why am I getting "User-agent appears in 2 rule blocks" warnings?
When using a preset AND custom rules that include the same user-agent (e.g., you add a * rule on top of the SEO-friendly preset which already has one), the bot will use the FIRST matching block and ignore the second. Either merge your rules into one block or remove the duplicate.
Q: Will my custom rules override the preset?
Custom rules are APPENDED after preset blocks. Since bots use the first matching block, if your custom block has the same user-agent as a preset block, the preset block wins. To override a preset rule, set preset: "none" and write all rules yourself.
Q: The robots.txt is saved in the key-value store — how do I download it?
Go to your Apify run, open the Key-Value Store, find the key robots.txt, and click "Download" or use the direct URL. The content-type is text/plain so browsers display it correctly.
Q: Can I run this actor on a schedule to keep my robots.txt up to date? Yes — use Apify Schedules (free with any plan) to run this actor automatically. Combine with a Zapier/Make integration to auto-deploy the output to your server.
Related actors
- Broken Link Checker — Find broken links on your site after updating robots.txt
- Canonical URL Checker — Verify canonical tags across your site
- Duplicate Content Checker — Detect duplicate pages before tuning crawl rules
- DNS Propagation Checker — Verify DNS settings alongside your SEO config
- Ads.txt Checker — Validate your ads.txt file for publisher compliance