Robots.txt Validator - Check Rules, Sitemaps & Crawl Directives
Pricing
Pay per usage
Robots.txt Validator - Check Rules, Sitemaps & Crawl Directives
Validate robots.txt for one or more websites: fetches /robots.txt per host, parses directive groups (User-agent/Allow/Disallow/Crawl-delay/Sitemap), reports common errors and warnings, and can test URLs against the chosen User-Agent.
Pricing
Pay per usage
Rating
0.0
(0)
Developer

Bikram Adhikari
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
6 days ago
Last modified
Categories
Share
Robots.txt Validator (SEO + Crawling Rules Checker)
Validate robots.txt for one or more websites.
This Actor:
- Fetches
/robots.txtfor each unique host derived fromstartUrls - Parses directive groups (
User-agent,Allow,Disallow,Crawl-delay) and extractsSitemapURLs - Reports common errors/warnings (invalid lines, unknown directives, rules before
User-agent, invalid sitemap URLs, etc.) - Optionally tests a list of URLs against the selected
User-Agent
Typical use cases
- SEO audits: verify
Sitemap:entries and robots configuration - QA checks: catch malformed directives before a production release
- Crawl planning: see whether important URLs are blocked for a given bot
Input
startUrls(required): any URLs on the target site(s)userAgent(default*): used to choose the best matching grouptestUrls(optional): URLs to evaluate as allowed/disallowed for the chosenuserAgentrequestTimeoutSecs(default15)maxRobotsTxtBytes(default500000)fallbackToHttp(defaulttrue)saveRawRobotsTxt(defaultfalse): storesrobots-<hostname>.txtin key-value storeproxyConfiguration(optional)
Output
Dataset items (one per host)
Each item includes:
hostname,robotsTxtUrl,statusCode,hasRobotsTxt,contentType,bytes,sha256selectedGroupUserAgents,crawlDelaySeconds,sitemapUrlserrors[]andwarnings[](withcode,message,line)testedUrls[](if provided)
Key-value store
REPORT(JSON): full per-host report arraySUMMARY(JSON): run summary and countsrobots-<hostname>.txt(text, optional): raw robots.txt
Notes
- If
/robots.txtreturns 404, it is treated as allow-all (with a warning) - This Actor is designed for validation and QA checks (not a full crawler)
SEO keywords
robots.txt validator, robots.txt checker, validate robots.txt, robots rules tester, sitemap directive checker, crawl-delay validator, allow disallow rules
Quick start
Store page: https://apify.com/scrappy_garden/robots-txt-validator
Paste this into Input and click Run:
{"startUrls": [{"url": "https://example.com/"}],"proxyConfiguration": {"useApifyProxy": false}}
Outputs (what you get)
- Dataset: Dataset items typically include fields like:
hostname,robotsTxtUrl,statusCode,hasRobotsTxt,crawlDelaySeconds,sitemapUrls,errors,warnings. - Key-value store:
REPORT,SUMMARY
Tips (trust + predictable results)
- Start with 1–3 URLs to validate behavior, then scale up.
- If a target blocks requests, enable Proxy and/or slow down concurrency in Input.
- Use the
SUMMARY/REPORTkeys (when present) for automation pipelines and monitoring.
Related actors
- sitemap-generator (https://apify.com/scrappy_garden/sitemap-generator)
- canonical-url-checker (https://apify.com/scrappy_garden/canonical-url-checker)
- broken-link-checker (https://apify.com/scrappy_garden/broken-link-checker)
- security-headers-checker (https://apify.com/scrappy_garden/security-headers-checker)
Search keywords
robots txt validator, robots.txt validator - check rules, sitemaps & crawl directives, website audit, seo, robots.txt