Pricing

Pay per event

Robots.txt & Sitemap Analyzer

This actor fetches and parses robots.txt and sitemap.xml files for any list of websites. It extracts crawl directives (user-agent rules, allowed/disallowed paths, crawl-delay), discovers sitemap URLs, and counts the number of pages listed in each sitemap. Use it for SEO audits, competitive...

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Actor stats

Bookmarked

Total users

Monthly active users

4 hours ago

Last modified

What does Robots.txt & Sitemap Analyzer do?

This actor fetches and parses robots.txt and sitemap.xml files for any list of websites. It extracts crawl directives (user-agent rules, allowed/disallowed paths, crawl-delay), discovers sitemap URLs, and counts the number of pages listed in each sitemap. Use it for SEO audits, competitive analysis, or monitoring crawl policies at scale.

Use cases

SEO specialists -- audit robots.txt rules and sitemap coverage across client websites
Competitive analysts -- compare crawl policies and indexed page counts across competitors
Site reliability engineers -- monitor changes in crawl rules or sitemap size over time
Web scraping engineers -- estimate site size and check allowed paths before building a crawler
Technical SEO consultants -- find missing sitemaps or misconfigured crawl directives during site audits

Why use Robots.txt & Sitemap Analyzer?

Bulk analysis -- check hundreds of websites in a single run
Robots.txt parsing -- extracts all user-agent rules, allow/disallow paths, and crawl-delay values
Sitemap discovery -- finds sitemaps declared in robots.txt and falls back to /sitemap.xml
Sitemap parsing -- distinguishes sitemap indexes from URL sets and counts pages
Raw text included -- returns the full robots.txt text for custom analysis
Fast and lightweight -- HTTP-only with no browser needed, so results come back quickly

Input parameters

Parameter	Type	Required	Default	Description
`urls`	string[]	Yes	--	List of websites to analyze. Domain names without protocol are auto-prefixed with https://
`parseSitemaps`	boolean	No	`true`	Fetch and parse sitemap.xml files to count URLs
`maxSitemapUrls`	integer	No	`1000`	Maximum number of URLs to count per sitemap file (100-50,000)

Example input

{
    "urls": ["apify.com", "google.com", "github.com"],
    "parseSitemaps": true,
    "maxSitemapUrls": 1000
}

Output example

{
    "url": "https://apify.com",
    "robotsTxt": {
        "exists": true,
        "rules": [
            {
                "userAgent": "*",
                "allow": ["/"],
                "disallow": ["/api/", "/admin/"],
                "crawlDelay": null
            }
        ],
        "sitemapUrls": ["https://apify.com/sitemap.xml"],
        "rawText": "User-agent: *\nAllow: /\nDisallow: /api/\nDisallow: /admin/\nSitemap: https://apify.com/sitemap.xml"
    },
    "sitemaps": [
        {
            "url": "https://apify.com/sitemap.xml",
            "urlCount": 542,
            "type": "urlset",
            "fetchError": null
        }
    ],
    "totalSitemapUrls": 542,
    "checkTimeMs": 1234,
    "error": null,
    "checkedAt": "2026-03-01T12:00:00.000Z"
}

Output fields

Field	Type	Description
`url`	string	The analyzed website URL
`robotsTxt.exists`	boolean	Whether robots.txt was found
`robotsTxt.rules`	array	Parsed user-agent rules with allow/disallow paths
`robotsTxt.sitemapUrls`	string[]	Sitemap URLs declared in robots.txt
`robotsTxt.rawText`	string	Full robots.txt content
`sitemaps`	array	Parsed sitemap info with URL counts
`totalSitemapUrls`	number	Total URLs found across all sitemaps
`checkTimeMs`	number	Analysis time in milliseconds
`error`	string	Error message if analysis failed
`checkedAt`	string	ISO timestamp of the check

How much does it cost to analyze robots.txt and sitemaps?

The actor uses Apify's pay-per-event pricing. You only pay for what you use.

Event	Price	Description
Start	$0.035	One-time per run
Site analyzed	$0.001	Per website analyzed

Example costs:

5 websites: $0.035 + 5 x $0.001 = $0.04
100 websites: $0.035 + 100 x $0.001 = $0.135
1,000 websites: $0.035 + 1,000 x $0.001 = $1.035

How to analyze robots.txt and sitemaps

Go to the Robots.txt & Sitemap Analyzer page on Apify
Enter one or more website URLs in the URLs field (e.g., apify.com, google.com)
Choose whether to enable sitemap parsing (enabled by default)
Set the Max Sitemap URLs limit if needed (default is 1,000)
Click Start and wait for results
Download the analysis as JSON, CSV, or Excel

Using the Apify API

You can start Robots.txt & Sitemap Analyzer programmatically from your own applications using the Apify API. The following examples show how to run the actor and retrieve results in both Node.js and Python.

Node.js

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_TOKEN' });
const run = await client.actor('automation-lab/robots-sitemap-analyzer').call({
    urls: ['apify.com', 'google.com'],
    parseSitemaps: true,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python

from apify_client import ApifyClient

client = ApifyClient('YOUR_TOKEN')
run = client.actor('automation-lab/robots-sitemap-analyzer').call(run_input={
    'urls': ['apify.com', 'google.com'],
    'parseSitemaps': True,
})
items = client.dataset(run['defaultDatasetId']).list_items().items
print(items)

cURL

curl "https://api.apify.com/v2/acts/automation-lab~robots-sitemap-analyzer/runs" \
  -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d '{"urls": ["apify.com", "google.com"], "parseSitemaps": true}'

Use with AI agents via MCP

Robots.txt & Sitemap Analyzer is available as a tool for AI assistants via the Model Context Protocol (MCP).

Setup for Claude Code

$claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/robots-sitemap-analyzer"

Setup for Claude Desktop, Cursor, or VS Code

{
    "mcpServers": {
        "apify": {
            "url": "https://mcp.apify.com?tools=automation-lab/robots-sitemap-analyzer"
        }
    }
}

Example prompts

"Analyze robots.txt for example.com"
"Check the sitemap structure for our website"

Learn more in the Apify MCP documentation.

Integrations

Robots.txt & Sitemap Analyzer works with all major automation platforms available on Apify. Export results to Google Sheets to build a crawl policy dashboard across all your monitored sites. Use Zapier or Make to schedule weekly checks and get notified when robots.txt rules change. Send alerts to Slack when a sitemap disappears or URL count drops significantly. Pipe results into n8n workflows for custom processing, or set up webhooks to trigger downstream actions as soon as a run finishes.

Tips and best practices

Use domain names without protocol -- the actor auto-prefixes https:// so you can just enter apify.com instead of https://apify.com
Increase maxSitemapUrls for large sites -- the default of 1,000 is fast but may undercount sites with tens of thousands of pages; increase to 50,000 for accurate counts
Set parseSitemaps to false if you only need robots.txt rules -- this speeds up the run by skipping sitemap fetching
Schedule regular runs to detect when competitors change their crawl policies or sitemap structure
Combine with Sitemap URL Extractor to first analyze sitemaps here, then extract the full URL list from the sitemaps that matter

Legality

This tool analyzes publicly accessible web content. Automated analysis of public web resources is standard practice in SEO and web development. Always respect robots.txt directives and rate limits when analyzing third-party websites. For personal data processing, ensure compliance with applicable privacy regulations.

FAQ

What happens if a website has no robots.txt? The result will show robotsTxt.exists: false and empty rules. The actor will still try to find and parse sitemap.xml at the default location.

Does the actor follow sitemap index files? Yes. When a sitemap is a sitemap index (containing links to other sitemaps), the actor follows the child sitemaps and counts URLs across all of them.

The sitemap URL count is lower than expected. Why? By default, the actor counts up to 1,000 URLs per sitemap file (maxSitemapUrls). If a sitemap has more URLs, the count will be capped at the limit. Increase maxSitemapUrls to 50,000 for large sites to get an accurate count. Also, some sites use sitemap indexes with many child sitemaps -- the actor follows these, but very large sitemap trees may take longer to process.

The actor found no sitemaps but my site has one. What happened? The actor looks for sitemaps declared in robots.txt first, then falls back to checking /sitemap.xml. If your sitemap is at a non-standard location (e.g., /sitemap_index.xml or /sitemaps/main.xml) and is not listed in robots.txt, the actor will not find it. Add a Sitemap: directive to your robots.txt pointing to your sitemap location.

Can I analyze sites that require authentication? No. The actor uses plain HTTP requests and cannot handle login-protected robots.txt or sitemap files. It works with publicly accessible files only.

How do I check if a website's robots.txt is blocking search engines?

To check if a website's robots.txt is blocking Googlebot or other search engine crawlers, fetch the robots.txt file and look for Disallow directives under the User-agent: * or User-agent: Googlebot blocks. Robots.txt & Sitemap Analyzer returns all parsed rules in structured JSON, so you can instantly see which paths are blocked for each user agent.

A common misconfiguration is Disallow: / under User-agent: * — this blocks all crawlers from indexing the entire site. Another is forgetting to remove a staging-environment robots.txt rule (Disallow: /) after launching a site, which has accidentally de-indexed thousands of sites over the years. With bulk analysis, you can audit hundreds of client or competitor sites in a single run to catch these issues.

How do I find all sitemaps for a website?

Websites can have multiple sitemaps: a main sitemap.xml, a sitemap index that links to sub-sitemaps (one per content type or section), and news or video sitemaps. To find them all you need to:

Check robots.txt for Sitemap: directives — this is where most well-configured sites list their sitemap locations.
Try the default location /sitemap.xml as a fallback.
Follow any sitemap index files to discover child sitemaps.

Robots.txt & Sitemap Analyzer automates all three steps. It reads the Sitemap: entries in robots.txt, fetches each sitemap, detects whether it is a sitemap index or URL set, and recursively follows child sitemaps. The output shows the URL count for each discovered sitemap and a combined totalSitemapUrls figure.

How can I compare competitor website sizes using sitemaps?

Sitemap URL counts are a useful proxy for website size and content volume. A competitor with 50,000 URLs in their sitemap likely has a much larger content moat than one with 500. You can use this data to:

Benchmark your own indexed page count against competitors in your niche.
Identify competitors who are aggressively publishing new content (sitemap grows fast over repeated checks).
Spot thin-content sites that have a large URL count relative to their traffic — a signal of low content quality.

To compare competitors, input a list of their domains, enable sitemap parsing, and set maxSitemapUrls to 50,000 for accuracy. Schedule the run weekly to track how competitor content volumes change over time.

What is the difference between robots.txt and a sitemap?

robots.txt is an instruction file for crawlers — it tells search engines which parts of a site they are allowed to visit and how fast they may crawl. It does not affect which pages are indexed if those pages are linked from elsewhere; it only controls crawler access.

A sitemap (typically sitemap.xml) is a directory of URLs the site owner wants search engines to discover and index. Sitemaps help with crawl efficiency, especially for large sites or pages with few inbound links. Including a URL in a sitemap does not guarantee indexing, but it does signal to Google that the page should be crawled.

Together, robots.txt and sitemaps form the foundation of a site's crawl configuration. Robots.txt & Sitemap Analyzer reads both files in a single request, giving you a complete picture of how a site manages crawler access and content discovery.

Can I monitor robots.txt changes automatically?

Yes. One of the most powerful use cases for Robots.txt & Sitemap Analyzer is scheduled monitoring. A competitor changing their robots.txt to block a previously crawlable section often signals a strategic content change — they may be hiding a new product category, blocking price comparison crawlers, or restructuring their site.

To monitor changes automatically:

Schedule a daily or weekly run via Apify Schedules.
Export results to Google Sheets to build a change log.
Use a Zapier or Make integration to send a Slack alert whenever the rawText content or sitemapUrls list changes compared to a previous run.

This gives you early warning when competitors alter their crawl policies — without any manual checking.

Other SEO tools

Redirect Chain Analyzer — trace HTTP redirect chains and detect issues
Broken Link Checker — find broken links across your website
HTTP Status Checker — check HTTP status codes for a list of URLs
Sitemap URL Extractor — extract all URLs from XML sitemaps
Website Health Report — comprehensive website health audit

Robots.txt Auditor & Sitemap Finder

andok/robotstxt-auditor

Scan robots.txt files in bulk to extract sitemap URLs and verify crawler directives for technical SEO compliance.

Andok

Robots Txt Analyzer

zerobreak/robots-txt-analyzer

Robots txt analyzer that fetches and parses crawl rules from any website in bulk, so SEO teams and developers can audit blocked paths, user agents, and sitemap locations across hundreds of domains without manual work.

ZeroBreak

Robots.txt Validator - Check Rules, Sitemaps & Crawl Directives

scrappy_garden/robots-txt-validator

Validate robots.txt for one or more websites: fetches /robots.txt per host, parses directive groups (User-agent/Allow/Disallow/Crawl-delay/Sitemap), reports common errors and warnings, and can test URLs against the chosen User-Agent.

Bikram Adhikari

Find Sitemap from url

eesti/find-sitemap-from-url

A powerful [Apify Actor] that finds sitemap URLs for any website. This Actor helps you discover XML sitemaps by checking common locations, robots.txt files, and analyzing HTML content for sitemap links.

ando

210

1.0

Sitemap Scraper

pvillalva/sitemap-scraper

The Sitemap Scraper extracts and outputs all URLs from a given sitemap.

Percival Villalva

247

Sitemap API

vivid_astronaut/sitemap

Fabio Suizu

Sitemap Analyzer API | sitemap.xml SEO Audit

taroyamada/sitemap-analyzer

Analyze sitemap.xml files for structure, freshness, broken URLs, and crawl-ready SEO insights at scale.

太郎山田

Sitemap URL Extractor

onescales/sitemap-url-extractor

Provide a website link to a sitemap.xml and the app will extract and list all URLs in the sitemap as well as additional data in the sitemap (i.e. https://onescales.com/sitemap.xml).

One Scales

443

5.0

Sitemap Generator - Crawl Website & Create XML Sitemap

scrappy_garden/sitemap-generator

Generate an XML sitemap for any website. Crawls internal pages from start URLs (with depth + page limits), deduplicates URLs, and stores a ready-to-submit sitemap.xml plus a structured dataset and summary for SEO audits.