Robots.txt Validator - Check Rules, Sitemaps & Crawl Directives avatar
Robots.txt Validator - Check Rules, Sitemaps & Crawl Directives

Pricing

Pay per usage

Go to Apify Store
Robots.txt Validator - Check Rules, Sitemaps & Crawl Directives

Robots.txt Validator - Check Rules, Sitemaps & Crawl Directives

Validate robots.txt for one or more websites: fetches /robots.txt per host, parses directive groups (User-agent/Allow/Disallow/Crawl-delay/Sitemap), reports common errors and warnings, and can test URLs against the chosen User-Agent.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Bikram Adhikari

Bikram Adhikari

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

6 days ago

Last modified

Categories

Share

Robots.txt Validator (SEO + Crawling Rules Checker)

Validate robots.txt for one or more websites.

This Actor:

  • Fetches /robots.txt for each unique host derived from startUrls
  • Parses directive groups (User-agent, Allow, Disallow, Crawl-delay) and extracts Sitemap URLs
  • Reports common errors/warnings (invalid lines, unknown directives, rules before User-agent, invalid sitemap URLs, etc.)
  • Optionally tests a list of URLs against the selected User-Agent

Typical use cases

  • SEO audits: verify Sitemap: entries and robots configuration
  • QA checks: catch malformed directives before a production release
  • Crawl planning: see whether important URLs are blocked for a given bot

Input

  • startUrls (required): any URLs on the target site(s)
  • userAgent (default *): used to choose the best matching group
  • testUrls (optional): URLs to evaluate as allowed/disallowed for the chosen userAgent
  • requestTimeoutSecs (default 15)
  • maxRobotsTxtBytes (default 500000)
  • fallbackToHttp (default true)
  • saveRawRobotsTxt (default false): stores robots-<hostname>.txt in key-value store
  • proxyConfiguration (optional)

Output

Dataset items (one per host)

Each item includes:

  • hostname, robotsTxtUrl, statusCode, hasRobotsTxt, contentType, bytes, sha256
  • selectedGroupUserAgents, crawlDelaySeconds, sitemapUrls
  • errors[] and warnings[] (with code, message, line)
  • testedUrls[] (if provided)

Key-value store

  • REPORT (JSON): full per-host report array
  • SUMMARY (JSON): run summary and counts
  • robots-<hostname>.txt (text, optional): raw robots.txt

Notes

  • If /robots.txt returns 404, it is treated as allow-all (with a warning)
  • This Actor is designed for validation and QA checks (not a full crawler)

SEO keywords

robots.txt validator, robots.txt checker, validate robots.txt, robots rules tester, sitemap directive checker, crawl-delay validator, allow disallow rules

Quick start

Store page: https://apify.com/scrappy_garden/robots-txt-validator

Paste this into Input and click Run:

{
"startUrls": [
{
"url": "https://example.com/"
}
],
"proxyConfiguration": {
"useApifyProxy": false
}
}

Outputs (what you get)

  • Dataset: Dataset items typically include fields like: hostname, robotsTxtUrl, statusCode, hasRobotsTxt, crawlDelaySeconds, sitemapUrls, errors, warnings.
  • Key-value store: REPORT, SUMMARY

Tips (trust + predictable results)

  • Start with 1–3 URLs to validate behavior, then scale up.
  • If a target blocks requests, enable Proxy and/or slow down concurrency in Input.
  • Use the SUMMARY / REPORT keys (when present) for automation pipelines and monitoring.

Search keywords

robots txt validator, robots.txt validator - check rules, sitemaps & crawl directives, website audit, seo, robots.txt