Pricing

$4.99/month + usage

Robots.txt Validator - Check Rules, Sitemaps & Crawl Directives

Validate robots.txt for one or more websites: fetches /robots.txt per host, parses directive groups (User-agent/Allow/Disallow/Crawl-delay/Sitemap), reports common errors and warnings, and can test URLs against the chosen User-Agent.

Pricing

$4.99/month + usage

Rating

0.0

(0)

Developer

Bikram Adhikari

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

Robots.txt Validator (SEO + Crawling Rules Checker)

Validate robots.txt for one or more websites.

This Actor:

Fetches /robots.txt for each unique host derived from startUrls
Parses directive groups (User-agent, Allow, Disallow, Crawl-delay) and extracts Sitemap URLs
Reports common errors/warnings (invalid lines, unknown directives, rules before User-agent, invalid sitemap URLs, etc.)
Optionally tests a list of URLs against the selected User-Agent

Typical use cases

SEO audits: verify Sitemap: entries and robots configuration
QA checks: catch malformed directives before a production release
Crawl planning: see whether important URLs are blocked for a given bot

Input

startUrls (required): any URLs on the target site(s)
userAgent (default *): used to choose the best matching group
testUrls (optional): URLs to evaluate as allowed/disallowed for the chosen userAgent
requestTimeoutSecs (default 15)
maxRobotsTxtBytes (default 500000)
fallbackToHttp (default true)
saveRawRobotsTxt (default false): stores robots-<hostname>.txt in key-value store
proxyConfiguration (optional)

Output

Dataset items (one per host)

Each item includes:

hostname, robotsTxtUrl, statusCode, hasRobotsTxt, contentType, bytes, sha256
selectedGroupUserAgents, crawlDelaySeconds, sitemapUrls
errors[] and warnings[] (with code, message, line)
testedUrls[] (if provided)

Key-value store

REPORT (JSON): full per-host report array
SUMMARY (JSON): run summary and counts
robots-<hostname>.txt (text, optional): raw robots.txt

Notes

If /robots.txt returns 404, it is treated as allow-all (with a warning)
This Actor is designed for validation and QA checks (not a full crawler)

SEO keywords

robots.txt validator, robots.txt checker, validate robots.txt, robots rules tester, sitemap directive checker, crawl-delay validator, allow disallow rules

Quick start

Store page: https://apify.com/scrappy_garden/robots-txt-validator

Paste this into Input and click Run:

{
  "startUrls": [
    {
      "url": "https://example.com/"
    }
  ],
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}

Outputs (what you get)

Dataset: Dataset items typically include fields like: hostname, robotsTxtUrl, statusCode, hasRobotsTxt, crawlDelaySeconds, sitemapUrls, errors, warnings.
Key-value store: REPORT, SUMMARY

Tips (trust + predictable results)

Start with 1–3 URLs to validate behavior, then scale up.
If a target blocks requests, enable Proxy and/or slow down concurrency in Input.
Use the SUMMARY / REPORT keys (when present) for automation pipelines and monitoring.

sitemap-generator (https://apify.com/scrappy_garden/sitemap-generator)
canonical-url-checker (https://apify.com/scrappy_garden/canonical-url-checker)
broken-link-checker (https://apify.com/scrappy_garden/broken-link-checker)
security-headers-checker (https://apify.com/scrappy_garden/security-headers-checker)

Search keywords

robots txt validator, robots.txt validator - check rules, sitemaps & crawl directives, website audit, seo, robots.txt

Robots.txt & Sitemap Analyzer

automation-lab/robots-sitemap-analyzer

This actor fetches and parses robots.txt and sitemap.xml files for any list of websites. It extracts crawl directives (user-agent rules, allowed/disallowed paths, crawl-delay), discovers sitemap URLs, and counts the number of pages listed in each sitemap. Use it for SEO audits, competitive...

Stas Persiianenko

Robots.txt Auditor & Sitemap Finder

andok/robotstxt-auditor

Scan robots.txt files in bulk to extract sitemap URLs and verify crawler directives for technical SEO compliance.

Andok

Robots Txt Analyzer

zerobreak/robots-txt-analyzer

Robots txt analyzer that fetches and parses crawl rules from any website in bulk, so SEO teams and developers can audit blocked paths, user agents, and sitemap locations across hundreds of domains without manual work.

ZeroBreak

Robots.txt Validator

predictable_function/my-actor-3

List of website base URLs whose robots.txt files will be validated

riya rawat

5.0

Robots.txt Checker - CMS-Aware Analysis with AI Recommendations

alizarin_refrigerator-owner/robots-txt-checker

The Robots.txt Checker provides comprehensive analysis of your robots.txt file: Syntax Validation CMS Detection - Identify WordPress, Shopify, Drupal,& 6+ other CMS platforms Best Practice Check Companion File Checks - sitemap.xml, llms.txt, security.txt AI Recommendations - CMS-specific suggestions

The Howlers

Fast Sitemap Generator

eunit/sitemap-generator

Boost SEO with this automatic Sitemap Generator. Crawl any site to create XML, HTML, & TXT sitemaps. Supports custom depth, regex filters, & robots.txt. Compatible with Google Search Console.

Emmanuel Uchenna

5.0

Find Sitemap from url

eesti/find-sitemap-from-url

A powerful [Apify Actor] that finds sitemap URLs for any website. This Actor helps you discover XML sitemaps by checking common locations, robots.txt files, and analyzing HTML content for sitemap links.

ando

208

1.0

Robots.txt Monitor

datawinder/robots-txt-monitor

Stateful robots.txt monitoring with baseline awareness and severity-classified alerts. Detects meaningful policy changes over time — not noisy diffs.

Datawinder

Indexability Audit

zerobreak/indexability-audit

Indexability audit tool that checks robots.txt, meta robots tags, X-Robots-Tag headers, and canonical URLs for any list of pages, so SEO teams know which ones Google can actually crawl and index.

ZeroBreak

Website Metadata Extractor (meta tags, sitemap, robots) 🔎

powerful_bachelor/website-metadata-extractor

🔍 Website Metadata Extractor 🌐 Extract essential website data: meta tags, robots.txt, and sitemap.xml in one scan. 📊 Analyze SEO elements, crawler directives, and site structure. ✅ Perfect for SEO audits, 🔎 competitor research, and 🚀 understanding how search engines view your website.