Pricing

Pay per event

Robots.txt Generator

Generate valid robots.txt files from structured rules. Apply presets (block AI bots, SEO-friendly), add custom per-bot rules, sitemaps, and crawl-delay. Zero-proxy, instant output.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

What does it do?

The Robots.txt Generator takes structured JSON rules and transforms them into a properly formatted robots.txt file that you can deploy directly to your website root. It:

🚀 Applies one-click presets: Block All, Allow All, Block AI Crawlers, SEO-Friendly
📋 Accepts custom per-bot rules (user-agent, allow paths, disallow paths, crawl-delay)
🗺️ Appends sitemap URLs and optional Host directive
✅ Validates rules for conflicts, duplicates, invalid paths, and missing wildcards
💾 Saves output to dataset (JSON) and key-value store (robots.txt, text/plain) for direct download
⚡ Runs in under 5 seconds — no network requests required

Who is it for?

🧑‍💻 Web developers and DevOps engineers

You manage multiple sites and need to generate consistent, validated robots.txt files without error-prone manual editing. Use this actor as part of your deployment pipeline — run it via API, download the output, and deploy automatically.

📊 SEO professionals and consultants

You're optimizing client sites and need to quickly generate robots.txt files that follow SEO best practices: allow search engine crawlers, block scrapers and AI training bots, and include sitemaps — all in one step.

🤖 AI/LLM product owners

You want to protect your content from AI training crawlers (GPTBot, CCBot, anthropic-ai, etc.) without blocking legitimate search engines. The "Block AI Crawlers" preset handles this automatically with an up-to-date list of known AI bots.

🏢 Enterprise teams managing many properties

You're maintaining a set of standard robots.txt policies across a portfolio of websites and need a programmatic, repeatable way to generate them from structured configuration.

Why use it?

Problem	Without this actor	With Robots.txt Generator
Syntax errors	Easy to make by hand	Automatically formatted
Conflicting rules	Hard to spot manually	Validated with warnings
AI bot list	Research bots yourself	16 AI crawlers pre-loaded
Sitemap inclusion	Remember to add manually	Just list URLs
Consistency	Copy-paste across sites	Programmatic, repeatable
Integration	Manual file upload	API-driven, downloadable

How much does it cost to generate a robots.txt file?

This actor uses Pay-Per-Event pricing — you only pay for what you generate.

Event	FREE tier	BRONZE	SILVER	GOLD	PLATINUM	DIAMOND
Actor start	$0.005	$0.00475	$0.00425	$0.00375	$0.003	$0.0025
Per robots.txt generated	$0.001	$0.0009	$0.0008	$0.00065	$0.0005	$0.0004

Typical cost per run: $0.006 (start + 1 file on FREE tier).

💡 Free plan estimate: Apify's free tier gives you $5/month in credit. That's enough for ~800 robots.txt files per month at FREE tier pricing.

This actor is one of the most affordable on the Store — zero proxy costs means there's no bandwidth markup.

How to use it

Step 1: Choose a preset (optional)

Start with a preset that matches your use case:

Allow all — open your site to all crawlers
Block all — protect a staging or private site
Block AI crawlers — block GPTBot, CCBot, anthropic-ai, and 13 other AI training bots
SEO-friendly — allow search engines, block scrapers and AI bots, disallow admin areas

Step 2: Add custom rules (optional)

Append your own per-bot rules on top of the preset. Specify:

{
    "userAgent": "Googlebot",
    "allow": ["/"],
    "disallow": ["/admin/", "/private/"],
    "crawlDelay": 1,
    "comment": "Googlebot: full access except admin"
}

Step 3: Add sitemaps and host (optional)

{
    "sitemaps": ["https://example.com/sitemap.xml"],
    "host": "example.com"
}

Step 4: Run and download

The actor saves your robots.txt to:

Dataset — JSON record with content, warnings, and metadata
Key-value store — raw text/plain file (key: robots.txt) for direct download

Input parameters

Parameter	Type	Default	Description
`preset`	string	`seo-friendly`	Quick preset: `none`, `allow-all`, `block-all`, `block-ai-crawlers`, `seo-friendly`
`rules`	array	`[]`	Custom user-agent rule blocks (user-agent, allow, disallow, crawl-delay, comment)
`sitemaps`	array	`[]`	Sitemap URLs to append as `Sitemap:` directives
`host`	string	`""`	Optional `Host:` directive (used by Yandex)
`includeTimestamp`	boolean	`true`	Add generation timestamp comment
`includeGeneratorComment`	boolean	`true`	Add Apify generator comment
`validateRules`	boolean	`true`	Check for conflicts, duplicates, invalid paths

Rule block schema

Each item in the rules array:

{
    "userAgent": "Googlebot",
    "allow": ["/public/"],
    "disallow": ["/admin/", "/login/"],
    "crawlDelay": 2,
    "comment": "Optional comment above this block"
}

Field	Type	Required	Description
`userAgent`	string or string[]	Yes	Bot name(s), e.g. `"*"`, `"Googlebot"`, `["GPTBot", "CCBot"]`
`allow`	string[]	No	Paths the bot CAN access
`disallow`	string[]	No	Paths the bot CANNOT access
`crawlDelay`	number	No	Seconds between requests (not supported by Googlebot)
`comment`	string	No	Comment placed above the block

Output fields

Field	Type	Description
`robotsTxt`	string	The complete robots.txt file content
`warnings`	string[]	Validation warnings (conflicts, duplicates, etc.)
`ruleBlockCount`	number	Total User-agent blocks in the file
`sitemapCount`	number	Total Sitemap: directives
`lineCount`	number	Total lines in the file
`generatedAt`	string	ISO 8601 timestamp
`preset`	string	Preset that was applied
`success`	boolean	Whether generation succeeded
`error`	string	Error message (if failed)

Example output

Input

{
    "preset": "block-ai-crawlers",
    "sitemaps": ["https://mysite.com/sitemap.xml"],
    "includeTimestamp": true
}

Output robots.txt

# robots.txt generated by Apify Robots.txt Generator
# https://apify.com/automation-lab/robots-txt-generator
# Generated: 2026-04-09T10:00:00.000Z
# Preset: Block AI Crawlers

User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

# ... (13 more AI bots)

# Allow all other bots
User-agent: *
Allow: /

Sitemap: https://mysite.com/sitemap.xml

Tips and best practices

🎯 Rule ordering matters

Bots use the first matching User-agent block. If you have both a specific bot rule (Googlebot) and a wildcard rule (*), Googlebot will use its specific block and ignore the wildcard. Add specific bot blocks BEFORE the wildcard block.

📌 The wildcard block is your default

Always include a User-agent: * block as your default rule. Without it, bots that aren't explicitly listed have no rules — this can cause unpredictable crawling behavior.

⏱️ Crawl-delay is not universal

Crawl-delay is supported by Bingbot, Yandex, and many lesser-known bots, but Googlebot ignores it. To slow down Googlebot, use Google Search Console's crawl rate settings instead.

🚫 Don't block CSS and JS from Googlebot

Blocking /css/, /js/, or /assets/ prevents Googlebot from rendering your pages correctly, which can hurt rankings. Use the SEO-friendly preset which leaves these open.

✅ Validate before deploying

The actor's built-in validator catches common mistakes: overlapping allow/disallow patterns, duplicate user-agents, paths not starting with /, and missing wildcard blocks.

🔄 Regenerate when your site structure changes

Update and re-run whenever you add admin panels, API endpoints, or new content sections that should be crawled differently.

Presets reference

Allow All

User-agent: *
Allow: /

Use for: Public sites, CDNs, open documentation.

Block All

User-agent: *
Disallow: /

Use for: Staging environments, private intranets, sites under construction.

Block AI Crawlers

Blocks all known AI training crawlers: GPTBot (OpenAI), CCBot (Common Crawl), anthropic-ai (Anthropic), Claude-Web, Google-Extended (Gemini training), cohere-ai, PerplexityBot, Bytespider (ByteDance), Diffbot, FacebookBot, ImagesiftBot, Omgilibot, YouBot, Applebot-Extended, ChatGPT-User, PetalBot.

All other bots (including Googlebot, Bingbot) are allowed. Use for: Publishers protecting training data rights, paywalled content sites.

SEO-Friendly

Allows all search engine bots with Allow: /
Blocks 10 bad/spam bots (AhrefsBot, SemrushBot, MJ12bot, etc.)
Blocks 16 AI crawlers
Disallows common admin paths (/admin/, /wp-admin/, /login/, /cart/, /checkout/)
Applies Crawl-delay: 1 to all bots

Use for: Production websites, e-commerce sites, blogs, SaaS marketing pages.

Integrations

🔗 Zapier / Make automation

Trigger this actor on a schedule and automatically deploy the output to your web server:

Set up a scheduled Apify run (or cron trigger)
In Zapier/Make, trigger when the run completes
Fetch the robots.txt from the key-value store URL
Upload to your server via FTP, S3, or API

☁️ AWS S3 deployment pipeline

Use the Apify API to run this actor as part of your CI/CD pipeline, then upload the output directly to S3:

RUN_OUTPUT=$(apify call automation-lab/robots-txt-generator --input-file=robots-config.json --output=dataset)
aws s3 cp robots.txt s3://mybucket/robots.txt --content-type text/plain

🌐 Cloudflare Workers / CDN edge

Generate your robots.txt via this actor and serve it directly from a KV store URL in Cloudflare Workers, enabling instant updates without redeployment.

📊 Spreadsheet-driven multi-site management

Store your robots.txt configuration in Google Sheets (one row per site). Use Make.com to read each row and trigger this actor, then deploy each generated file to the corresponding site.

API usage

Node.js

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });

const run = await client.actor('automation-lab/robots-txt-generator').call({
    preset: 'seo-friendly',
    rules: [
        {
            userAgent: 'Googlebot',
            allow: ['/'],
            disallow: ['/admin/'],
        },
    ],
    sitemaps: ['https://example.com/sitemap.xml'],
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items[0].robotsTxt);

Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")

run = client.actor("automation-lab/robots-txt-generator").call(run_input={
    "preset": "block-ai-crawlers",
    "sitemaps": ["https://example.com/sitemap.xml"],
    "validateRules": True,
})

items = client.dataset(run["defaultDatasetId"]).list_items().items
print(items[0]["robotsTxt"])

cURL

# Start the actor
curl -X POST "https://api.apify.com/v2/acts/automation-lab~robots-txt-generator/runs?token=YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "preset": "seo-friendly",
    "sitemaps": ["https://example.com/sitemap.xml"]
  }'

# Get results (replace DATASET_ID with run output)
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN"

MCP — use with Claude, Cursor, and VS Code

You can use this actor as an MCP tool in Claude Desktop, Claude Code, Cursor, and VS Code to generate robots.txt files directly from AI chat.

Claude Code (terminal)

$claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/robots-txt-generator"

Claude Desktop / Cursor / VS Code

Add to your MCP config file:

{
    "mcpServers": {
        "apify": {
            "type": "http",
            "url": "https://mcp.apify.com?tools=automation-lab/robots-txt-generator",
            "headers": {
                "Authorization": "Bearer YOUR_APIFY_TOKEN"
            }
        }
    }
}

Example prompts

"Generate a robots.txt that blocks GPTBot and CCBot but allows Googlebot and Bingbot. Add my sitemap at https://example.com/sitemap.xml."

"Create a robots.txt using the SEO-friendly preset for a WordPress site, but also disallow /wp-content/uploads/."

"Generate a robots.txt that blocks all bots from /staging/ and /draft/ paths, but allows Googlebot everywhere else."

Legal and compliance

Robots.txt is a voluntary protocol (the Robots Exclusion Standard). Compliant bots like Googlebot and Bingbot honor these directives. However, some scrapers and AI crawlers may ignore your robots.txt.

For legally binding protection against crawling, you must also:

Add Terms of Service that explicitly prohibit scraping
Implement technical access controls (rate limiting, authentication)
Consider emerging AI scraping liability developments in your jurisdiction

This actor generates robots.txt files that comply with the RFC 9309 specification. The SEO-friendly and block-AI-crawlers presets include directives targeting bots known to the developer as of April 2026 — the list may not be exhaustive.

FAQ

Q: What's the difference between Allow and Disallow? Disallow tells bots not to crawl a path. Allow explicitly permits access to a path that would otherwise be blocked by a broader Disallow rule. When paths conflict, the more specific (longer) rule wins in Googlebot — other bots may vary.

Q: Does Googlebot respect Crawl-delay? No. Googlebot ignores the Crawl-delay directive. To control Googlebot's crawl rate, use Google Search Console's crawl rate settings. Other bots (Bingbot, Yandex) do respect Crawl-delay.

Q: How do I block a bot from only some pages? Create a rule block for that specific bot with Disallow: paths, BEFORE the User-agent: * block. Bots use the first matching rule block.

Q: Why am I getting "User-agent appears in 2 rule blocks" warnings? When using a preset AND custom rules that include the same user-agent (e.g., you add a * rule on top of the SEO-friendly preset which already has one), the bot will use the FIRST matching block and ignore the second. Either merge your rules into one block or remove the duplicate.

Q: Will my custom rules override the preset? Custom rules are APPENDED after preset blocks. Since bots use the first matching block, if your custom block has the same user-agent as a preset block, the preset block wins. To override a preset rule, set preset: "none" and write all rules yourself.

Q: The robots.txt is saved in the key-value store — how do I download it? Go to your Apify run, open the Key-Value Store, find the key robots.txt, and click "Download" or use the direct URL. The content-type is text/plain so browsers display it correctly.

Q: Can I run this actor on a schedule to keep my robots.txt up to date? Yes — use Apify Schedules (free with any plan) to run this actor automatically. Combine with a Zapier/Make integration to auto-deploy the output to your server.

Broken Link Checker — Find broken links on your site after updating robots.txt
Canonical URL Checker — Verify canonical tags across your site
Duplicate Content Checker — Detect duplicate pages before tuning crawl rules
DNS Propagation Checker — Verify DNS settings alongside your SEO config
Ads.txt Checker — Validate your ads.txt file for publisher compliance

Robots.txt Validator - Crawl Rules Analyzer

pink_comic/robots-txt-validator

Analyze robots.txt files for any domain. Extract crawl rules, sitemaps, blocked paths, and crawl-delay settings. Validate configuration and identify SEO issues in bulk.

Ava Torres

Robots.txt Validator - Check Rules, Sitemaps & Crawl Directives

scrappy_garden/robots-txt-validator

Validate robots.txt for one or more websites: fetches /robots.txt per host, parses directive groups (User-agent/Allow/Disallow/Crawl-delay/Sitemap), reports common errors and warnings, and can test URLs against the chosen User-Agent.

Bikram Adhikari

robots.txt Parser & AI Crawler Block Checker

taroyamada/robotstxt-ai-checker

robots.txt parser that audits AI crawler block rules (GPTBot, ClaudeBot, anthropic-ai, PerplexityBot) across thousands of websites in one run. Returns per-bot allow/disallow disposition and crawl-delay.

太郎山田

Robots.txt Validator

predictable_function/my-actor-3

List of website base URLs whose robots.txt files will be validated

riya rawat

5.0

Robots Txt Analyzer

zerobreak/robots-txt-analyzer

Robots txt analyzer that fetches and parses crawl rules from any website in bulk, so SEO teams and developers can audit blocked paths, user agents, and sitemap locations across hundreds of domains without manual work.

ZeroBreak

Robots.txt Auditor & Sitemap Finder

andok/robotstxt-auditor

Scan robots.txt files in bulk to extract sitemap URLs and verify crawler directives for technical SEO compliance.

Andok

Fast Sitemap Generator

eunit/sitemap-generator

Boost SEO with this automatic Sitemap Generator. Crawl any site to create XML, HTML, & TXT sitemaps. Supports custom depth, regex filters, & robots.txt. Compatible with Google Search Console.

Emmanuel Uchenna

5.0

Robots.txt & Sitemap Analyzer

automation-lab/robots-sitemap-analyzer

This actor fetches and parses robots.txt and sitemap.xml files for any list of websites. It extracts crawl directives (user-agent rules, allowed/disallowed paths, crawl-delay), discovers sitemap URLs, and counts the number of pages listed in each sitemap. Use it for SEO audits, competitive...

Stas Persiianenko

Ai Visibility Suite - Dark Visitors Alternative

alizarin_refrigerator-owner/ai-visibility-suite---dark-visitors-alternative

Comprehensive AI bot monitoring, robots.txt analysis, LLMs.txt generation & AI shopping optimization. Monitor AI crawlers visits, check AI compliance, generate AI-friendly configurations, and optimize for AI shopping agents. AI Bot Directory Robots.txt LLMs.txt AI Shopping Competitor AI Audit

The Howlers

Robots.txt Checker - CMS-Aware Analysis with AI Recommendations

alizarin_refrigerator-owner/robots-txt-checker

The Robots.txt Checker provides comprehensive analysis of your robots.txt file: Syntax Validation CMS Detection - Identify WordPress, Shopify, Drupal,& 6+ other CMS platforms Best Practice Check Companion File Checks - sitemap.xml, llms.txt, security.txt AI Recommendations - CMS-specific suggestions

The Howlers

Robots.txt Generator

What does it do?

Who is it for?

🧑‍💻 Web developers and DevOps engineers

📊 SEO professionals and consultants

🤖 AI/LLM product owners

🏢 Enterprise teams managing many properties

Why use it?

How much does it cost to generate a robots.txt file?

How to use it

Step 1: Choose a preset (optional)

Step 2: Add custom rules (optional)

Step 3: Add sitemaps and host (optional)

Step 4: Run and download

Input parameters

Rule block schema

Output fields

Example output

Input

Output robots.txt

Tips and best practices

🎯 Rule ordering matters

📌 The wildcard block is your default

⏱️ Crawl-delay is not universal

🚫 Don't block CSS and JS from Googlebot

✅ Validate before deploying

🔄 Regenerate when your site structure changes

Presets reference

Allow All

Block All

Block AI Crawlers

SEO-Friendly

Integrations

🔗 Zapier / Make automation

☁️ AWS S3 deployment pipeline

🌐 Cloudflare Workers / CDN edge

📊 Spreadsheet-driven multi-site management

API usage

Node.js

Python

cURL

MCP — use with Claude, Cursor, and VS Code

Claude Code (terminal)

Claude Desktop / Cursor / VS Code

Example prompts

Legal and compliance

FAQ

Related actors

You might also like

Robots.txt Validator - Crawl Rules Analyzer

Robots.txt Validator - Check Rules, Sitemaps & Crawl Directives

robots.txt Parser & AI Crawler Block Checker

Robots.txt Validator

Robots Txt Analyzer

Robots.txt Auditor & Sitemap Finder

Fast Sitemap Generator

Robots.txt & Sitemap Analyzer

Ai Visibility Suite - Dark Visitors Alternative

Robots.txt Checker - CMS-Aware Analysis with AI Recommendations