Pricing

from $1.00 / 1,000 domain auditeds

Go to Apify Store

Robots.txt Auditor & Sitemap Finder

Try for free

Scan robots.txt files in bulk to extract sitemap URLs and verify crawler directives for technical SEO compliance.

Pricing

from $1.00 / 1,000 domain auditeds

Rating

0.0

(0)

Developer

Andok

Actor stats

Bookmarked

Total users

Monthly active users

3 months ago

Last modified

Robots.txt Auditor

Audit robots.txt files across hundreds of domains to catch crawl-blocking mistakes that silently hurt SEO. A single misconfigured Disallow rule can deindex entire site sections — this actor fetches, parses, and reports on every robots.txt in bulk. Run it against your own sites or competitor domains to extract sitemap declarations, user-agent rules, and crawl directives in one pass.

Features

Bulk auditing — process hundreds of domains in a single run with configurable concurrency
Sitemap discovery — extracts all Sitemap: directives declared in each robots.txt
User-agent analysis — identifies every crawler-specific rule block in the file
Status reporting — captures HTTP status codes, file size, and fetch errors
Flexible input — accepts full URLs or bare domains (auto-resolves to /robots.txt)
Error resilience — reports failures per domain without stopping the run
Timestamp tracking — records when each domain was checked for audit trails

Input

Field	Type	Required	Default	Description
`urls`	`array`	Yes	—	List of URLs or domains to audit (e.g. `example.com` or `https://example.com`)
`url`	`string`	No	—	Single URL for backward compatibility. Merged into `urls` if both are provided.
`timeoutSeconds`	`integer`	No	`15`	HTTP timeout in seconds for each robots.txt fetch
`concurrency`	`integer`	No	`10`	Number of domains to process in parallel (1-50)

Input Example

{
    "urls": ["https://crawlee.dev", "https://apify.com", "https://example.com"],
    "timeoutSeconds": 15,
    "concurrency": 10
}

Output

Each domain produces one dataset item with the robots.txt status, discovered sitemaps, and user-agent blocks.

inputUrl (string) — the original URL or domain you provided
robotsUrl (string | null) — the resolved robots.txt URL
status (number | null) — HTTP status code (200, 404, etc.)
contentLength (number) — file size in bytes
sitemapCount (number) — number of Sitemap: directives found
sitemaps (string[]) — list of sitemap URLs declared in the file
userAgents (string[]) — list of unique User-agent values
error (string | null) — error message if the fetch failed
checkedAt (string) — ISO timestamp of when the check ran

Output Example

{
    "inputUrl": "https://crawlee.dev",
    "robotsUrl": "https://crawlee.dev/robots.txt",
    "status": 200,
    "contentLength": 342,
    "sitemapCount": 2,
    "sitemaps": [
        "https://crawlee.dev/sitemap.xml",
        "https://crawlee.dev/sitemap-blog.xml"
    ],
    "userAgents": ["*", "Googlebot", "AhrefsBot"],
    "error": null,
    "checkedAt": "2025-11-20T14:30:00.000Z"
}

Pricing

Event	Cost
Domain Audited	$0.001 per domain

You are charged per domain audited. Platform usage fees apply separately.

Use Cases

SEO audits — check whether robots.txt accidentally blocks important pages or crawlers
Sitemap discovery — extract all declared sitemap URLs across a portfolio of domains
Competitor intelligence — see which crawlers competitors specifically block or allow
Migration validation — verify robots.txt is correctly configured after a domain migration
Agency reporting — audit robots.txt across all client domains in a single scheduled run

Actor	What it adds
XML Sitemap URL Extractor	Extract all URLs from the sitemaps discovered in robots.txt
Broken Links Checker	Crawl your site to find broken links that robots.txt might be masking
Tech Stack Analyzer	Detect the CMS and frameworks behind the domains you audit

Robots.txt & Sitemap Analyzer

automation-lab/robots-sitemap-analyzer

This actor fetches and parses robots.txt and sitemap.xml files for any list of websites. It extracts crawl directives (user-agent rules, allowed/disallowed paths, crawl-delay), discovers sitemap URLs, and counts the number of pages listed in each sitemap. Use it for SEO audits, competitive...

Stas Persiianenko

Sitemap Robots Delta Monitor

tom_the_builder/sitemap-robots-delta-monitor

Monitor sitemap.xml and robots.txt for URL inventory changes and return new, changed, or removed URLs in normalized JSON.

Danil Iarmolchik

Website Metadata Extractor (meta tags, sitemap, robots) 🔎

powerful_bachelor/website-metadata-extractor

🔍 Website Metadata Extractor 🌐 Extract essential website data: meta tags, robots.txt, and sitemap.xml in one scan. 📊 Analyze SEO elements, crawler directives, and site structure. ✅ Perfect for SEO audits, 🔎 competitor research, and 🚀 understanding how search engines view your website.

Powerful Bachelor

Sitemap Sniffer

crawlerbros/sitemap-sniffer

Discover every sitemap file for a website. Reads robots.txt for Sitemap directives, probes common sitemap paths, and recursively unpacks sitemap-index files. HTTP-only, no proxy or cookies needed.

Crawler Bros

5.0

Robots.txt Validator

predictable_function/my-actor-3

List of website base URLs whose robots.txt files will be validated

riya rawat

5.0

Robots.txt Validator - Check Rules, Sitemaps & Crawl Directives

scrappy_garden/robots-txt-validator

Validate robots.txt for one or more websites: fetches /robots.txt per host, parses directive groups (User-agent/Allow/Disallow/Crawl-delay/Sitemap), reports common errors and warnings, and can test URLs against the chosen User-Agent.

Bikram Adhikari

Find Sitemap from url

eesti/find-sitemap-from-url

A powerful [Apify Actor] that finds sitemap URLs for any website. This Actor helps you discover XML sitemaps by checking common locations, robots.txt files, and analyzing HTML content for sitemap links.

ando

210

1.0

Website Sitemap Extractor

glassventures/website-sitemap-extractor

Extract all URLs from any website's sitemap. Auto-discovers sitemaps from robots.txt, supports sitemap index files and .gz compression. Filter by URL pattern, date range.

Glass Ventures

Robots Txt Analyzer

zerobreak/robots-txt-analyzer

Robots txt analyzer that fetches and parses crawl rules from any website in bulk, so SEO teams and developers can audit blocked paths, user agents, and sitemap locations across hundreds of domains without manual work.

ZeroBreak

Fast Sitemap Generator

eunit/sitemap-generator

Boost SEO with this automatic Sitemap Generator. Crawl any site to create XML, HTML, & TXT sitemaps. Supports custom depth, regex filters, & robots.txt. Compatible with Google Search Console.