Robots.txt & Sitemap Analyzer
Pricing
Pay per event
Robots.txt & Sitemap Analyzer
This actor fetches and parses robots.txt and sitemap.xml files for any list of websites. It extracts crawl directives (user-agent rules, allowed/disallowed paths, crawl-delay), discovers sitemap URLs, and counts the number of pages listed in each sitemap. Use it for SEO audits, competitive...
Pricing
Pay per event
Rating
0.0
(0)
Developer

Stas Persiianenko
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
Fetch and analyze robots.txt and sitemap.xml for any website. Returns crawl rules (allow/disallow paths), sitemap locations, URL counts, and crawl-delay directives.
What does Robots.txt & Sitemap Analyzer do?
This actor fetches and parses robots.txt and sitemap.xml files for any list of websites. It extracts crawl directives (user-agent rules, allowed/disallowed paths, crawl-delay), discovers sitemap URLs, and counts the number of pages listed in each sitemap. Use it for SEO audits, competitive analysis, or monitoring crawl policies at scale.
Use cases
- SEO specialists -- audit robots.txt rules and sitemap coverage across client websites
- Competitive analysts -- compare crawl policies and indexed page counts across competitors
- Site reliability engineers -- monitor changes in crawl rules or sitemap size over time
- Web scraping engineers -- estimate site size and check allowed paths before building a crawler
- Technical SEO consultants -- find missing sitemaps or misconfigured crawl directives during site audits
Why use Robots.txt & Sitemap Analyzer?
- Bulk analysis -- check hundreds of websites in a single run
- Robots.txt parsing -- extracts all user-agent rules, allow/disallow paths, and crawl-delay values
- Sitemap discovery -- finds sitemaps declared in robots.txt and falls back to /sitemap.xml
- Sitemap parsing -- distinguishes sitemap indexes from URL sets and counts pages
- Raw text included -- returns the full robots.txt text for custom analysis
- Fast and lightweight -- HTTP-only with no browser needed, so results come back quickly
Input parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
urls | string[] | Yes | -- | List of websites to analyze. Domain names without protocol are auto-prefixed with https:// |
parseSitemaps | boolean | No | true | Fetch and parse sitemap.xml files to count URLs |
maxSitemapUrls | integer | No | 1000 | Maximum number of URLs to count per sitemap file (100-50,000) |
Example input
{"urls": ["apify.com", "google.com", "github.com"],"parseSitemaps": true,"maxSitemapUrls": 1000}
Output example
{"url": "https://apify.com","robotsTxt": {"exists": true,"rules": [{"userAgent": "*","allow": ["/"],"disallow": ["/api/", "/admin/"],"crawlDelay": null}],"sitemapUrls": ["https://apify.com/sitemap.xml"],"rawText": "User-agent: *\nAllow: /\nDisallow: /api/\nDisallow: /admin/\nSitemap: https://apify.com/sitemap.xml"},"sitemaps": [{"url": "https://apify.com/sitemap.xml","urlCount": 542,"type": "urlset","fetchError": null}],"totalSitemapUrls": 542,"checkTimeMs": 1234,"error": null,"checkedAt": "2026-03-01T12:00:00.000Z"}
Output fields
| Field | Type | Description |
|---|---|---|
url | string | The analyzed website URL |
robotsTxt.exists | boolean | Whether robots.txt was found |
robotsTxt.rules | array | Parsed user-agent rules with allow/disallow paths |
robotsTxt.sitemapUrls | string[] | Sitemap URLs declared in robots.txt |
robotsTxt.rawText | string | Full robots.txt content |
sitemaps | array | Parsed sitemap info with URL counts |
totalSitemapUrls | number | Total URLs found across all sitemaps |
checkTimeMs | number | Analysis time in milliseconds |
error | string | Error message if analysis failed |
checkedAt | string | ISO timestamp of the check |
How much does it cost?
The actor uses Apify's pay-per-event pricing. You only pay for what you use.
| Event | Price | Description |
|---|---|---|
| Start | $0.035 | One-time per run |
| Site analyzed | $0.001 | Per website analyzed |
Example costs:
- 5 websites: $0.035 + 5 x $0.001 = $0.04
- 100 websites: $0.035 + 100 x $0.001 = $0.135
- 1,000 websites: $0.035 + 1,000 x $0.001 = $1.035
Using the Apify API
You can start Robots.txt & Sitemap Analyzer programmatically from your own applications using the Apify API. The following examples show how to run the actor and retrieve results in both Node.js and Python.
Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_TOKEN' });const run = await client.actor('automation-lab/robots-sitemap-analyzer').call({urls: ['apify.com', 'google.com'],parseSitemaps: true,});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
Python
from apify_client import ApifyClientclient = ApifyClient('YOUR_TOKEN')run = client.actor('automation-lab/robots-sitemap-analyzer').call(run_input={'urls': ['apify.com', 'google.com'],'parseSitemaps': True,})items = client.dataset(run['defaultDatasetId']).list_items().itemsprint(items)
Integrations
Robots.txt & Sitemap Analyzer works with all major automation platforms available on Apify. Export results to Google Sheets to build a crawl policy dashboard across all your monitored sites. Use Zapier or Make to schedule weekly checks and get notified when robots.txt rules change. Send alerts to Slack when a sitemap disappears or URL count drops significantly. Pipe results into n8n workflows for custom processing, or set up webhooks to trigger downstream actions as soon as a run finishes.
Tips and best practices
- Use domain names without protocol -- the actor auto-prefixes
https://so you can just enterapify.cominstead ofhttps://apify.com - Increase
maxSitemapUrlsfor large sites -- the default of 1,000 is fast but may undercount sites with tens of thousands of pages; increase to 50,000 for accurate counts - Set
parseSitemapsto false if you only need robots.txt rules -- this speeds up the run by skipping sitemap fetching - Schedule regular runs to detect when competitors change their crawl policies or sitemap structure
- Combine with Sitemap URL Extractor to first analyze sitemaps here, then extract the full URL list from the sitemaps that matter
FAQ
What happens if a website has no robots.txt?
The result will show robotsTxt.exists: false and empty rules. The actor will still try to find and parse sitemap.xml at the default location.
Does the actor follow sitemap index files? Yes. When a sitemap is a sitemap index (containing links to other sitemaps), the actor follows the child sitemaps and counts URLs across all of them.
Can I analyze sites that require authentication? No. The actor uses plain HTTP requests and cannot handle login-protected robots.txt or sitemap files. It works with publicly accessible files only.