Robots.txt Generator avatar

Robots.txt Generator

Pricing

Pay per event

Go to Apify Store
Robots.txt Generator

Robots.txt Generator

Generate valid robots.txt files from structured rules. Apply presets (block AI bots, SEO-friendly), add custom per-bot rules, sitemaps, and crawl-delay. Zero-proxy, instant output.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Stas Persiianenko

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

Generate valid, production-ready robots.txt files from structured rules — no manual editing, no syntax errors. Define per-bot allow/disallow paths, crawl delays, sitemap URLs, and apply one-click presets like "Block AI Crawlers" or "SEO-Friendly". Zero-proxy, instant output.

Zero proxy. Zero scraping. Pure computation. This actor runs entirely on Apify compute — no bandwidth costs, no rate limits, no CAPTCHAs.


What does it do?

The Robots.txt Generator takes structured JSON rules and transforms them into a properly formatted robots.txt file that you can deploy directly to your website root. It:

  • 🚀 Applies one-click presets: Block All, Allow All, Block AI Crawlers, SEO-Friendly
  • 📋 Accepts custom per-bot rules (user-agent, allow paths, disallow paths, crawl-delay)
  • 🗺️ Appends sitemap URLs and optional Host directive
  • Validates rules for conflicts, duplicates, invalid paths, and missing wildcards
  • 💾 Saves output to dataset (JSON) and key-value store (robots.txt, text/plain) for direct download
  • ⚡ Runs in under 5 seconds — no network requests required

Who is it for?

🧑‍💻 Web developers and DevOps engineers

You manage multiple sites and need to generate consistent, validated robots.txt files without error-prone manual editing. Use this actor as part of your deployment pipeline — run it via API, download the output, and deploy automatically.

📊 SEO professionals and consultants

You're optimizing client sites and need to quickly generate robots.txt files that follow SEO best practices: allow search engine crawlers, block scrapers and AI training bots, and include sitemaps — all in one step.

🤖 AI/LLM product owners

You want to protect your content from AI training crawlers (GPTBot, CCBot, anthropic-ai, etc.) without blocking legitimate search engines. The "Block AI Crawlers" preset handles this automatically with an up-to-date list of known AI bots.

🏢 Enterprise teams managing many properties

You're maintaining a set of standard robots.txt policies across a portfolio of websites and need a programmatic, repeatable way to generate them from structured configuration.


Why use it?

ProblemWithout this actorWith Robots.txt Generator
Syntax errorsEasy to make by handAutomatically formatted
Conflicting rulesHard to spot manuallyValidated with warnings
AI bot listResearch bots yourself16 AI crawlers pre-loaded
Sitemap inclusionRemember to add manuallyJust list URLs
ConsistencyCopy-paste across sitesProgrammatic, repeatable
IntegrationManual file uploadAPI-driven, downloadable

How much does it cost to generate a robots.txt file?

This actor uses Pay-Per-Event pricing — you only pay for what you generate.

EventFREE tierBRONZESILVERGOLDPLATINUMDIAMOND
Actor start$0.005$0.00475$0.00425$0.00375$0.003$0.0025
Per robots.txt generated$0.001$0.0009$0.0008$0.00065$0.0005$0.0004

Typical cost per run: $0.006 (start + 1 file on FREE tier).

💡 Free plan estimate: Apify's free tier gives you $5/month in credit. That's enough for ~800 robots.txt files per month at FREE tier pricing.

This actor is one of the most affordable on the Store — zero proxy costs means there's no bandwidth markup.


How to use it

Step 1: Choose a preset (optional)

Start with a preset that matches your use case:

  • Allow all — open your site to all crawlers
  • Block all — protect a staging or private site
  • Block AI crawlers — block GPTBot, CCBot, anthropic-ai, and 13 other AI training bots
  • SEO-friendly — allow search engines, block scrapers and AI bots, disallow admin areas

Step 2: Add custom rules (optional)

Append your own per-bot rules on top of the preset. Specify:

{
"userAgent": "Googlebot",
"allow": ["/"],
"disallow": ["/admin/", "/private/"],
"crawlDelay": 1,
"comment": "Googlebot: full access except admin"
}

Step 3: Add sitemaps and host (optional)

{
"sitemaps": ["https://example.com/sitemap.xml"],
"host": "example.com"
}

Step 4: Run and download

The actor saves your robots.txt to:

  • Dataset — JSON record with content, warnings, and metadata
  • Key-value store — raw text/plain file (key: robots.txt) for direct download

Input parameters

ParameterTypeDefaultDescription
presetstringseo-friendlyQuick preset: none, allow-all, block-all, block-ai-crawlers, seo-friendly
rulesarray[]Custom user-agent rule blocks (user-agent, allow, disallow, crawl-delay, comment)
sitemapsarray[]Sitemap URLs to append as Sitemap: directives
hoststring""Optional Host: directive (used by Yandex)
includeTimestampbooleantrueAdd generation timestamp comment
includeGeneratorCommentbooleantrueAdd Apify generator comment
validateRulesbooleantrueCheck for conflicts, duplicates, invalid paths

Rule block schema

Each item in the rules array:

{
"userAgent": "Googlebot",
"allow": ["/public/"],
"disallow": ["/admin/", "/login/"],
"crawlDelay": 2,
"comment": "Optional comment above this block"
}
FieldTypeRequiredDescription
userAgentstring or string[]YesBot name(s), e.g. "*", "Googlebot", ["GPTBot", "CCBot"]
allowstring[]NoPaths the bot CAN access
disallowstring[]NoPaths the bot CANNOT access
crawlDelaynumberNoSeconds between requests (not supported by Googlebot)
commentstringNoComment placed above the block

Output fields

FieldTypeDescription
robotsTxtstringThe complete robots.txt file content
warningsstring[]Validation warnings (conflicts, duplicates, etc.)
ruleBlockCountnumberTotal User-agent blocks in the file
sitemapCountnumberTotal Sitemap: directives
lineCountnumberTotal lines in the file
generatedAtstringISO 8601 timestamp
presetstringPreset that was applied
successbooleanWhether generation succeeded
errorstringError message (if failed)

Example output

Input

{
"preset": "block-ai-crawlers",
"sitemaps": ["https://mysite.com/sitemap.xml"],
"includeTimestamp": true
}

Output robots.txt

# robots.txt generated by Apify Robots.txt Generator
# https://apify.com/automation-lab/robots-txt-generator
# Generated: 2026-04-09T10:00:00.000Z
# Preset: Block AI Crawlers
User-agent: GPTBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: CCBot
Disallow: /
User-agent: anthropic-ai
Disallow: /
# ... (13 more AI bots)
# Allow all other bots
User-agent: *
Allow: /
Sitemap: https://mysite.com/sitemap.xml

Tips and best practices

🎯 Rule ordering matters

Bots use the first matching User-agent block. If you have both a specific bot rule (Googlebot) and a wildcard rule (*), Googlebot will use its specific block and ignore the wildcard. Add specific bot blocks BEFORE the wildcard block.

📌 The wildcard block is your default

Always include a User-agent: * block as your default rule. Without it, bots that aren't explicitly listed have no rules — this can cause unpredictable crawling behavior.

⏱️ Crawl-delay is not universal

Crawl-delay is supported by Bingbot, Yandex, and many lesser-known bots, but Googlebot ignores it. To slow down Googlebot, use Google Search Console's crawl rate settings instead.

🚫 Don't block CSS and JS from Googlebot

Blocking /css/, /js/, or /assets/ prevents Googlebot from rendering your pages correctly, which can hurt rankings. Use the SEO-friendly preset which leaves these open.

✅ Validate before deploying

The actor's built-in validator catches common mistakes: overlapping allow/disallow patterns, duplicate user-agents, paths not starting with /, and missing wildcard blocks.

🔄 Regenerate when your site structure changes

Update and re-run whenever you add admin panels, API endpoints, or new content sections that should be crawled differently.


Presets reference

Allow All

User-agent: *
Allow: /

Use for: Public sites, CDNs, open documentation.

Block All

User-agent: *
Disallow: /

Use for: Staging environments, private intranets, sites under construction.

Block AI Crawlers

Blocks all known AI training crawlers: GPTBot (OpenAI), CCBot (Common Crawl), anthropic-ai (Anthropic), Claude-Web, Google-Extended (Gemini training), cohere-ai, PerplexityBot, Bytespider (ByteDance), Diffbot, FacebookBot, ImagesiftBot, Omgilibot, YouBot, Applebot-Extended, ChatGPT-User, PetalBot.

All other bots (including Googlebot, Bingbot) are allowed. Use for: Publishers protecting training data rights, paywalled content sites.

SEO-Friendly

  • Allows all search engine bots with Allow: /
  • Blocks 10 bad/spam bots (AhrefsBot, SemrushBot, MJ12bot, etc.)
  • Blocks 16 AI crawlers
  • Disallows common admin paths (/admin/, /wp-admin/, /login/, /cart/, /checkout/)
  • Applies Crawl-delay: 1 to all bots

Use for: Production websites, e-commerce sites, blogs, SaaS marketing pages.


Integrations

🔗 Zapier / Make automation

Trigger this actor on a schedule and automatically deploy the output to your web server:

  1. Set up a scheduled Apify run (or cron trigger)
  2. In Zapier/Make, trigger when the run completes
  3. Fetch the robots.txt from the key-value store URL
  4. Upload to your server via FTP, S3, or API

☁️ AWS S3 deployment pipeline

Use the Apify API to run this actor as part of your CI/CD pipeline, then upload the output directly to S3:

RUN_OUTPUT=$(apify call automation-lab/robots-txt-generator --input-file=robots-config.json --output=dataset)
aws s3 cp robots.txt s3://mybucket/robots.txt --content-type text/plain

🌐 Cloudflare Workers / CDN edge

Generate your robots.txt via this actor and serve it directly from a KV store URL in Cloudflare Workers, enabling instant updates without redeployment.

📊 Spreadsheet-driven multi-site management

Store your robots.txt configuration in Google Sheets (one row per site). Use Make.com to read each row and trigger this actor, then deploy each generated file to the corresponding site.


API usage

Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('automation-lab/robots-txt-generator').call({
preset: 'seo-friendly',
rules: [
{
userAgent: 'Googlebot',
allow: ['/'],
disallow: ['/admin/'],
},
],
sitemaps: ['https://example.com/sitemap.xml'],
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items[0].robotsTxt);

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("automation-lab/robots-txt-generator").call(run_input={
"preset": "block-ai-crawlers",
"sitemaps": ["https://example.com/sitemap.xml"],
"validateRules": True,
})
items = client.dataset(run["defaultDatasetId"]).list_items().items
print(items[0]["robotsTxt"])

cURL

# Start the actor
curl -X POST "https://api.apify.com/v2/acts/automation-lab~robots-txt-generator/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"preset": "seo-friendly",
"sitemaps": ["https://example.com/sitemap.xml"]
}'
# Get results (replace DATASET_ID with run output)
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN"

MCP — use with Claude, Cursor, and VS Code

You can use this actor as an MCP tool in Claude Desktop, Claude Code, Cursor, and VS Code to generate robots.txt files directly from AI chat.

Claude Code (terminal)

$claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/robots-txt-generator"

Claude Desktop / Cursor / VS Code

Add to your MCP config file:

{
"mcpServers": {
"apify": {
"type": "http",
"url": "https://mcp.apify.com?tools=automation-lab/robots-txt-generator",
"headers": {
"Authorization": "Bearer YOUR_APIFY_TOKEN"
}
}
}
}

Example prompts

"Generate a robots.txt that blocks GPTBot and CCBot but allows Googlebot and Bingbot. Add my sitemap at https://example.com/sitemap.xml."

"Create a robots.txt using the SEO-friendly preset for a WordPress site, but also disallow /wp-content/uploads/."

"Generate a robots.txt that blocks all bots from /staging/ and /draft/ paths, but allows Googlebot everywhere else."


Robots.txt is a voluntary protocol (the Robots Exclusion Standard). Compliant bots like Googlebot and Bingbot honor these directives. However, some scrapers and AI crawlers may ignore your robots.txt.

For legally binding protection against crawling, you must also:

  • Add Terms of Service that explicitly prohibit scraping
  • Implement technical access controls (rate limiting, authentication)
  • Consider emerging AI scraping liability developments in your jurisdiction

This actor generates robots.txt files that comply with the RFC 9309 specification. The SEO-friendly and block-AI-crawlers presets include directives targeting bots known to the developer as of April 2026 — the list may not be exhaustive.


FAQ

Q: What's the difference between Allow and Disallow? Disallow tells bots not to crawl a path. Allow explicitly permits access to a path that would otherwise be blocked by a broader Disallow rule. When paths conflict, the more specific (longer) rule wins in Googlebot — other bots may vary.

Q: Does Googlebot respect Crawl-delay? No. Googlebot ignores the Crawl-delay directive. To control Googlebot's crawl rate, use Google Search Console's crawl rate settings. Other bots (Bingbot, Yandex) do respect Crawl-delay.

Q: How do I block a bot from only some pages? Create a rule block for that specific bot with Disallow: paths, BEFORE the User-agent: * block. Bots use the first matching rule block.

Q: Why am I getting "User-agent appears in 2 rule blocks" warnings? When using a preset AND custom rules that include the same user-agent (e.g., you add a * rule on top of the SEO-friendly preset which already has one), the bot will use the FIRST matching block and ignore the second. Either merge your rules into one block or remove the duplicate.

Q: Will my custom rules override the preset? Custom rules are APPENDED after preset blocks. Since bots use the first matching block, if your custom block has the same user-agent as a preset block, the preset block wins. To override a preset rule, set preset: "none" and write all rules yourself.

Q: The robots.txt is saved in the key-value store — how do I download it? Go to your Apify run, open the Key-Value Store, find the key robots.txt, and click "Download" or use the direct URL. The content-type is text/plain so browsers display it correctly.

Q: Can I run this actor on a schedule to keep my robots.txt up to date? Yes — use Apify Schedules (free with any plan) to run this actor automatically. Combine with a Zapier/Make integration to auto-deploy the output to your server.