Robots.txt & Sitemap Analyzer avatar

Robots.txt & Sitemap Analyzer

Pricing

Pay per event

Go to Apify Store
Robots.txt & Sitemap Analyzer

Robots.txt & Sitemap Analyzer

This actor fetches and parses robots.txt and sitemap.xml files for any list of websites. It extracts crawl directives (user-agent rules, allowed/disallowed paths, crawl-delay), discovers sitemap URLs, and counts the number of pages listed in each sitemap. Use it for SEO audits, competitive...

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Stas Persiianenko

Maintained by Community

Actor stats

0

Bookmarked

5

Total users

2

Monthly active users

9 hours ago

Last modified

Categories

Share

Fetch and analyze robots.txt and sitemap.xml for any website. Returns crawl rules (allow/disallow paths), sitemap locations, URL counts, and crawl-delay directives.

What does Robots.txt & Sitemap Analyzer do?

This actor fetches and parses robots.txt and sitemap.xml files for any list of websites. It extracts crawl directives (user-agent rules, allowed/disallowed paths, crawl-delay), discovers sitemap URLs, and counts the number of pages listed in each sitemap. Use it for SEO audits, competitive analysis, or monitoring crawl policies at scale.

Use cases

  • SEO specialists -- audit robots.txt rules and sitemap coverage across client websites
  • Competitive analysts -- compare crawl policies and indexed page counts across competitors
  • Site reliability engineers -- monitor changes in crawl rules or sitemap size over time
  • Web scraping engineers -- estimate site size and check allowed paths before building a crawler
  • Technical SEO consultants -- find missing sitemaps or misconfigured crawl directives during site audits

Why use Robots.txt & Sitemap Analyzer?

  • Bulk analysis -- check hundreds of websites in a single run
  • Robots.txt parsing -- extracts all user-agent rules, allow/disallow paths, and crawl-delay values
  • Sitemap discovery -- finds sitemaps declared in robots.txt and falls back to /sitemap.xml
  • Sitemap parsing -- distinguishes sitemap indexes from URL sets and counts pages
  • Raw text included -- returns the full robots.txt text for custom analysis
  • Fast and lightweight -- HTTP-only with no browser needed, so results come back quickly

Input parameters

ParameterTypeRequiredDefaultDescription
urlsstring[]Yes--List of websites to analyze. Domain names without protocol are auto-prefixed with https://
parseSitemapsbooleanNotrueFetch and parse sitemap.xml files to count URLs
maxSitemapUrlsintegerNo1000Maximum number of URLs to count per sitemap file (100-50,000)

Example input

{
"urls": ["apify.com", "google.com", "github.com"],
"parseSitemaps": true,
"maxSitemapUrls": 1000
}

Output example

{
"url": "https://apify.com",
"robotsTxt": {
"exists": true,
"rules": [
{
"userAgent": "*",
"allow": ["/"],
"disallow": ["/api/", "/admin/"],
"crawlDelay": null
}
],
"sitemapUrls": ["https://apify.com/sitemap.xml"],
"rawText": "User-agent: *\nAllow: /\nDisallow: /api/\nDisallow: /admin/\nSitemap: https://apify.com/sitemap.xml"
},
"sitemaps": [
{
"url": "https://apify.com/sitemap.xml",
"urlCount": 542,
"type": "urlset",
"fetchError": null
}
],
"totalSitemapUrls": 542,
"checkTimeMs": 1234,
"error": null,
"checkedAt": "2026-03-01T12:00:00.000Z"
}

Output fields

FieldTypeDescription
urlstringThe analyzed website URL
robotsTxt.existsbooleanWhether robots.txt was found
robotsTxt.rulesarrayParsed user-agent rules with allow/disallow paths
robotsTxt.sitemapUrlsstring[]Sitemap URLs declared in robots.txt
robotsTxt.rawTextstringFull robots.txt content
sitemapsarrayParsed sitemap info with URL counts
totalSitemapUrlsnumberTotal URLs found across all sitemaps
checkTimeMsnumberAnalysis time in milliseconds
errorstringError message if analysis failed
checkedAtstringISO timestamp of the check

How much does it cost to analyze robots.txt and sitemaps?

The actor uses Apify's pay-per-event pricing. You only pay for what you use.

EventPriceDescription
Start$0.035One-time per run
Site analyzed$0.001Per website analyzed

Example costs:

  • 5 websites: $0.035 + 5 x $0.001 = $0.04
  • 100 websites: $0.035 + 100 x $0.001 = $0.135
  • 1,000 websites: $0.035 + 1,000 x $0.001 = $1.035

How to analyze robots.txt and sitemaps

  1. Go to the Robots.txt & Sitemap Analyzer page on Apify
  2. Enter one or more website URLs in the URLs field (e.g., apify.com, google.com)
  3. Choose whether to enable sitemap parsing (enabled by default)
  4. Set the Max Sitemap URLs limit if needed (default is 1,000)
  5. Click Start and wait for results
  6. Download the analysis as JSON, CSV, or Excel

Using the Apify API

You can start Robots.txt & Sitemap Analyzer programmatically from your own applications using the Apify API. The following examples show how to run the actor and retrieve results in both Node.js and Python.

Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_TOKEN' });
const run = await client.actor('automation-lab/robots-sitemap-analyzer').call({
urls: ['apify.com', 'google.com'],
parseSitemaps: true,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python

from apify_client import ApifyClient
client = ApifyClient('YOUR_TOKEN')
run = client.actor('automation-lab/robots-sitemap-analyzer').call(run_input={
'urls': ['apify.com', 'google.com'],
'parseSitemaps': True,
})
items = client.dataset(run['defaultDatasetId']).list_items().items
print(items)

cURL

curl "https://api.apify.com/v2/acts/automation-lab~robots-sitemap-analyzer/runs" \
-X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_TOKEN" \
-d '{"urls": ["apify.com", "google.com"], "parseSitemaps": true}'

Use with AI agents via MCP

Robots.txt & Sitemap Analyzer is available as a tool for AI assistants via the Model Context Protocol (MCP).

Setup for Claude Code

$claude mcp add --transport http apify "https://mcp.apify.com"

Setup for Claude Desktop, Cursor, or VS Code

{
"mcpServers": {
"apify": {
"url": "https://mcp.apify.com"
}
}
}

Example prompts

  • "Analyze robots.txt for example.com"
  • "Check the sitemap structure for our website"

Learn more in the Apify MCP documentation.

Integrations

Robots.txt & Sitemap Analyzer works with all major automation platforms available on Apify. Export results to Google Sheets to build a crawl policy dashboard across all your monitored sites. Use Zapier or Make to schedule weekly checks and get notified when robots.txt rules change. Send alerts to Slack when a sitemap disappears or URL count drops significantly. Pipe results into n8n workflows for custom processing, or set up webhooks to trigger downstream actions as soon as a run finishes.

Tips and best practices

  • Use domain names without protocol -- the actor auto-prefixes https:// so you can just enter apify.com instead of https://apify.com
  • Increase maxSitemapUrls for large sites -- the default of 1,000 is fast but may undercount sites with tens of thousands of pages; increase to 50,000 for accurate counts
  • Set parseSitemaps to false if you only need robots.txt rules -- this speeds up the run by skipping sitemap fetching
  • Schedule regular runs to detect when competitors change their crawl policies or sitemap structure
  • Combine with Sitemap URL Extractor to first analyze sitemaps here, then extract the full URL list from the sitemaps that matter

Legality

This tool analyzes publicly accessible web content. Automated analysis of public web resources is standard practice in SEO and web development. Always respect robots.txt directives and rate limits when analyzing third-party websites. For personal data processing, ensure compliance with applicable privacy regulations.

FAQ

What happens if a website has no robots.txt? The result will show robotsTxt.exists: false and empty rules. The actor will still try to find and parse sitemap.xml at the default location.

Does the actor follow sitemap index files? Yes. When a sitemap is a sitemap index (containing links to other sitemaps), the actor follows the child sitemaps and counts URLs across all of them.

The sitemap URL count is lower than expected. Why? By default, the actor counts up to 1,000 URLs per sitemap file (maxSitemapUrls). If a sitemap has more URLs, the count will be capped at the limit. Increase maxSitemapUrls to 50,000 for large sites to get an accurate count. Also, some sites use sitemap indexes with many child sitemaps -- the actor follows these, but very large sitemap trees may take longer to process.

The actor found no sitemaps but my site has one. What happened? The actor looks for sitemaps declared in robots.txt first, then falls back to checking /sitemap.xml. If your sitemap is at a non-standard location (e.g., /sitemap_index.xml or /sitemaps/main.xml) and is not listed in robots.txt, the actor will not find it. Add a Sitemap: directive to your robots.txt pointing to your sitemap location.

Can I analyze sites that require authentication? No. The actor uses plain HTTP requests and cannot handle login-protected robots.txt or sitemap files. It works with publicly accessible files only.

Other SEO tools