Robots.txt Auditor & Sitemap Finder
Pricing
from $1.00 / 1,000 domain auditeds
Robots.txt Auditor & Sitemap Finder
Scan robots.txt files in bulk to extract sitemap URLs and verify crawler directives for technical SEO compliance.
Pricing
from $1.00 / 1,000 domain auditeds
Rating
0.0
(0)
Developer
Andok
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
21 days ago
Last modified
Categories
Share
Robots.txt Auditor
Audit robots.txt files across hundreds of domains to catch crawl-blocking mistakes that silently hurt SEO. A single misconfigured Disallow rule can deindex entire site sections — this actor fetches, parses, and reports on every robots.txt in bulk. Run it against your own sites or competitor domains to extract sitemap declarations, user-agent rules, and crawl directives in one pass.
Features
- Bulk auditing — process hundreds of domains in a single run with configurable concurrency
- Sitemap discovery — extracts all
Sitemap:directives declared in each robots.txt - User-agent analysis — identifies every crawler-specific rule block in the file
- Status reporting — captures HTTP status codes, file size, and fetch errors
- Flexible input — accepts full URLs or bare domains (auto-resolves to
/robots.txt) - Error resilience — reports failures per domain without stopping the run
- Timestamp tracking — records when each domain was checked for audit trails
Input
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
urls | array | Yes | — | List of URLs or domains to audit (e.g. example.com or https://example.com) |
url | string | No | — | Single URL for backward compatibility. Merged into urls if both are provided. |
timeoutSeconds | integer | No | 15 | HTTP timeout in seconds for each robots.txt fetch |
concurrency | integer | No | 10 | Number of domains to process in parallel (1-50) |
Input Example
{"urls": ["https://crawlee.dev", "https://apify.com", "https://example.com"],"timeoutSeconds": 15,"concurrency": 10}
Output
Each domain produces one dataset item with the robots.txt status, discovered sitemaps, and user-agent blocks.
inputUrl(string) — the original URL or domain you providedrobotsUrl(string | null) — the resolved robots.txt URLstatus(number | null) — HTTP status code (200, 404, etc.)contentLength(number) — file size in bytessitemapCount(number) — number ofSitemap:directives foundsitemaps(string[]) — list of sitemap URLs declared in the fileuserAgents(string[]) — list of uniqueUser-agentvalueserror(string | null) — error message if the fetch failedcheckedAt(string) — ISO timestamp of when the check ran
Output Example
{"inputUrl": "https://crawlee.dev","robotsUrl": "https://crawlee.dev/robots.txt","status": 200,"contentLength": 342,"sitemapCount": 2,"sitemaps": ["https://crawlee.dev/sitemap.xml","https://crawlee.dev/sitemap-blog.xml"],"userAgents": ["*", "Googlebot", "AhrefsBot"],"error": null,"checkedAt": "2025-11-20T14:30:00.000Z"}
Pricing
| Event | Cost |
|---|---|
| Domain Audited | $0.001 per domain |
You are charged per domain audited. Platform usage fees apply separately.
Use Cases
- SEO audits — check whether robots.txt accidentally blocks important pages or crawlers
- Sitemap discovery — extract all declared sitemap URLs across a portfolio of domains
- Competitor intelligence — see which crawlers competitors specifically block or allow
- Migration validation — verify robots.txt is correctly configured after a domain migration
- Agency reporting — audit robots.txt across all client domains in a single scheduled run
Related Actors
| Actor | What it adds |
|---|---|
| XML Sitemap URL Extractor | Extract all URLs from the sitemaps discovered in robots.txt |
| Broken Links Checker | Crawl your site to find broken links that robots.txt might be masking |
| Tech Stack Analyzer | Detect the CMS and frameworks behind the domains you audit |