Robots.txt Auditor & Sitemap Finder avatar

Robots.txt Auditor & Sitemap Finder

Pricing

from $1.00 / 1,000 domain auditeds

Go to Apify Store
Robots.txt Auditor & Sitemap Finder

Robots.txt Auditor & Sitemap Finder

Scan robots.txt files in bulk to extract sitemap URLs and verify crawler directives for technical SEO compliance.

Pricing

from $1.00 / 1,000 domain auditeds

Rating

0.0

(0)

Developer

Andok

Andok

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

21 days ago

Last modified

Share

Robots.txt Auditor

Audit robots.txt files across hundreds of domains to catch crawl-blocking mistakes that silently hurt SEO. A single misconfigured Disallow rule can deindex entire site sections — this actor fetches, parses, and reports on every robots.txt in bulk. Run it against your own sites or competitor domains to extract sitemap declarations, user-agent rules, and crawl directives in one pass.

Features

  • Bulk auditing — process hundreds of domains in a single run with configurable concurrency
  • Sitemap discovery — extracts all Sitemap: directives declared in each robots.txt
  • User-agent analysis — identifies every crawler-specific rule block in the file
  • Status reporting — captures HTTP status codes, file size, and fetch errors
  • Flexible input — accepts full URLs or bare domains (auto-resolves to /robots.txt)
  • Error resilience — reports failures per domain without stopping the run
  • Timestamp tracking — records when each domain was checked for audit trails

Input

FieldTypeRequiredDefaultDescription
urlsarrayYesList of URLs or domains to audit (e.g. example.com or https://example.com)
urlstringNoSingle URL for backward compatibility. Merged into urls if both are provided.
timeoutSecondsintegerNo15HTTP timeout in seconds for each robots.txt fetch
concurrencyintegerNo10Number of domains to process in parallel (1-50)

Input Example

{
"urls": ["https://crawlee.dev", "https://apify.com", "https://example.com"],
"timeoutSeconds": 15,
"concurrency": 10
}

Output

Each domain produces one dataset item with the robots.txt status, discovered sitemaps, and user-agent blocks.

  • inputUrl (string) — the original URL or domain you provided
  • robotsUrl (string | null) — the resolved robots.txt URL
  • status (number | null) — HTTP status code (200, 404, etc.)
  • contentLength (number) — file size in bytes
  • sitemapCount (number) — number of Sitemap: directives found
  • sitemaps (string[]) — list of sitemap URLs declared in the file
  • userAgents (string[]) — list of unique User-agent values
  • error (string | null) — error message if the fetch failed
  • checkedAt (string) — ISO timestamp of when the check ran

Output Example

{
"inputUrl": "https://crawlee.dev",
"robotsUrl": "https://crawlee.dev/robots.txt",
"status": 200,
"contentLength": 342,
"sitemapCount": 2,
"sitemaps": [
"https://crawlee.dev/sitemap.xml",
"https://crawlee.dev/sitemap-blog.xml"
],
"userAgents": ["*", "Googlebot", "AhrefsBot"],
"error": null,
"checkedAt": "2025-11-20T14:30:00.000Z"
}

Pricing

EventCost
Domain Audited$0.001 per domain

You are charged per domain audited. Platform usage fees apply separately.

Use Cases

  • SEO audits — check whether robots.txt accidentally blocks important pages or crawlers
  • Sitemap discovery — extract all declared sitemap URLs across a portfolio of domains
  • Competitor intelligence — see which crawlers competitors specifically block or allow
  • Migration validation — verify robots.txt is correctly configured after a domain migration
  • Agency reporting — audit robots.txt across all client domains in a single scheduled run
ActorWhat it adds
XML Sitemap URL ExtractorExtract all URLs from the sitemaps discovered in robots.txt
Broken Links CheckerCrawl your site to find broken links that robots.txt might be masking
Tech Stack AnalyzerDetect the CMS and frameworks behind the domains you audit