Robots.txt & Sitemap Analyzer avatar

Robots.txt & Sitemap Analyzer

Pricing

Pay per event

Go to Apify Store
Robots.txt & Sitemap Analyzer

Robots.txt & Sitemap Analyzer

This actor fetches and parses robots.txt and sitemap.xml files for any list of websites. It extracts crawl directives (user-agent rules, allowed/disallowed paths, crawl-delay), discovers sitemap URLs, and counts the number of pages listed in each sitemap. Use it for SEO audits, competitive...

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Stas Persiianenko

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Categories

Share

Fetch and analyze robots.txt and sitemap.xml for any website. Returns crawl rules (allow/disallow paths), sitemap locations, URL counts, and crawl-delay directives.

What does Robots.txt & Sitemap Analyzer do?

This actor fetches and parses robots.txt and sitemap.xml files for any list of websites. It extracts crawl directives (user-agent rules, allowed/disallowed paths, crawl-delay), discovers sitemap URLs, and counts the number of pages listed in each sitemap. Use it for SEO audits, competitive analysis, or monitoring crawl policies at scale.

Use cases

  • SEO specialists -- audit robots.txt rules and sitemap coverage across client websites
  • Competitive analysts -- compare crawl policies and indexed page counts across competitors
  • Site reliability engineers -- monitor changes in crawl rules or sitemap size over time
  • Web scraping engineers -- estimate site size and check allowed paths before building a crawler
  • Technical SEO consultants -- find missing sitemaps or misconfigured crawl directives during site audits

Why use Robots.txt & Sitemap Analyzer?

  • Bulk analysis -- check hundreds of websites in a single run
  • Robots.txt parsing -- extracts all user-agent rules, allow/disallow paths, and crawl-delay values
  • Sitemap discovery -- finds sitemaps declared in robots.txt and falls back to /sitemap.xml
  • Sitemap parsing -- distinguishes sitemap indexes from URL sets and counts pages
  • Raw text included -- returns the full robots.txt text for custom analysis
  • Fast and lightweight -- HTTP-only with no browser needed, so results come back quickly

Input parameters

ParameterTypeRequiredDefaultDescription
urlsstring[]Yes--List of websites to analyze. Domain names without protocol are auto-prefixed with https://
parseSitemapsbooleanNotrueFetch and parse sitemap.xml files to count URLs
maxSitemapUrlsintegerNo1000Maximum number of URLs to count per sitemap file (100-50,000)

Example input

{
"urls": ["apify.com", "google.com", "github.com"],
"parseSitemaps": true,
"maxSitemapUrls": 1000
}

Output example

{
"url": "https://apify.com",
"robotsTxt": {
"exists": true,
"rules": [
{
"userAgent": "*",
"allow": ["/"],
"disallow": ["/api/", "/admin/"],
"crawlDelay": null
}
],
"sitemapUrls": ["https://apify.com/sitemap.xml"],
"rawText": "User-agent: *\nAllow: /\nDisallow: /api/\nDisallow: /admin/\nSitemap: https://apify.com/sitemap.xml"
},
"sitemaps": [
{
"url": "https://apify.com/sitemap.xml",
"urlCount": 542,
"type": "urlset",
"fetchError": null
}
],
"totalSitemapUrls": 542,
"checkTimeMs": 1234,
"error": null,
"checkedAt": "2026-03-01T12:00:00.000Z"
}

Output fields

FieldTypeDescription
urlstringThe analyzed website URL
robotsTxt.existsbooleanWhether robots.txt was found
robotsTxt.rulesarrayParsed user-agent rules with allow/disallow paths
robotsTxt.sitemapUrlsstring[]Sitemap URLs declared in robots.txt
robotsTxt.rawTextstringFull robots.txt content
sitemapsarrayParsed sitemap info with URL counts
totalSitemapUrlsnumberTotal URLs found across all sitemaps
checkTimeMsnumberAnalysis time in milliseconds
errorstringError message if analysis failed
checkedAtstringISO timestamp of the check

How much does it cost?

The actor uses Apify's pay-per-event pricing. You only pay for what you use.

EventPriceDescription
Start$0.035One-time per run
Site analyzed$0.001Per website analyzed

Example costs:

  • 5 websites: $0.035 + 5 x $0.001 = $0.04
  • 100 websites: $0.035 + 100 x $0.001 = $0.135
  • 1,000 websites: $0.035 + 1,000 x $0.001 = $1.035

Using the Apify API

You can start Robots.txt & Sitemap Analyzer programmatically from your own applications using the Apify API. The following examples show how to run the actor and retrieve results in both Node.js and Python.

Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_TOKEN' });
const run = await client.actor('automation-lab/robots-sitemap-analyzer').call({
urls: ['apify.com', 'google.com'],
parseSitemaps: true,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python

from apify_client import ApifyClient
client = ApifyClient('YOUR_TOKEN')
run = client.actor('automation-lab/robots-sitemap-analyzer').call(run_input={
'urls': ['apify.com', 'google.com'],
'parseSitemaps': True,
})
items = client.dataset(run['defaultDatasetId']).list_items().items
print(items)

Integrations

Robots.txt & Sitemap Analyzer works with all major automation platforms available on Apify. Export results to Google Sheets to build a crawl policy dashboard across all your monitored sites. Use Zapier or Make to schedule weekly checks and get notified when robots.txt rules change. Send alerts to Slack when a sitemap disappears or URL count drops significantly. Pipe results into n8n workflows for custom processing, or set up webhooks to trigger downstream actions as soon as a run finishes.

Tips and best practices

  • Use domain names without protocol -- the actor auto-prefixes https:// so you can just enter apify.com instead of https://apify.com
  • Increase maxSitemapUrls for large sites -- the default of 1,000 is fast but may undercount sites with tens of thousands of pages; increase to 50,000 for accurate counts
  • Set parseSitemaps to false if you only need robots.txt rules -- this speeds up the run by skipping sitemap fetching
  • Schedule regular runs to detect when competitors change their crawl policies or sitemap structure
  • Combine with Sitemap URL Extractor to first analyze sitemaps here, then extract the full URL list from the sitemaps that matter

FAQ

What happens if a website has no robots.txt? The result will show robotsTxt.exists: false and empty rules. The actor will still try to find and parse sitemap.xml at the default location.

Does the actor follow sitemap index files? Yes. When a sitemap is a sitemap index (containing links to other sitemaps), the actor follows the child sitemaps and counts URLs across all of them.

Can I analyze sites that require authentication? No. The actor uses plain HTTP requests and cannot handle login-protected robots.txt or sitemap files. It works with publicly accessible files only.