Pricing

$2.99/month + usage

Robots Txt Analyzer

Robots txt analyzer that fetches and parses crawl rules from any website in bulk, so SEO teams and developers can audit blocked paths, user agents, and sitemap locations across hundreds of domains without manual work.

Pricing

$2.99/month + usage

Rating

0.0

(0)

Developer

ZeroBreak

Actor stats

Bookmarked

Total users

Monthly active users

4 months ago

Last modified

Robots.txt Analyzer: Parse and Validate Crawl Rules for Any Website

Robots.txt Analyzer fetches and parses robots.txt files for any website. Give it one domain or a list of hundreds and get back every directive: blocked paths, allowed paths, crawl delays, and sitemap URLs, organized by user agent. Most tools check robots.txt for a single site; this actor handles bulk analysis, so you can audit dozens of domains in a single run.

Use cases

SEO auditing: check which pages Googlebot or Bingbot can access before pushing new content live
Technical SEO review: audit robots.txt across dozens of client domains without opening each file manually
Bot access testing: verify whether a specific URL path is blocked for any crawler before deployment
Competitive analysis: compare robots.txt configurations across competitor domains to see what they protect from indexing
Site monitoring: schedule regular runs to catch unexpected changes that could block search engine crawlers
QA validation: confirm that robots.txt deployments match intended crawl rules after each release

Input

Parameter	Type	Default	Description
`url`	string		Single website URL to analyze
`urls`	array		List of website URLs for bulk analysis. One URL per line.
`userAgent`	string	`*`	Crawler user agent to check rules for (e.g. `Googlebot`, `Bingbot`, `*`)
`checkPath`	string		Specific URL path to check (e.g. `/admin/`)
`maxUrls`	integer	`100`	Maximum number of URLs to process per run
`timeoutSecs`	integer	`300`	Overall actor timeout in seconds
`requestTimeoutSecs`	integer	`30`	Per-request timeout in seconds
`proxyConfiguration`	object	Datacenter (Anywhere)	Proxy type and location for requests. Supports Datacenter, Residential, Special, and custom proxies. Optional.

Example input

{
    "urls": ["https://apify.com", "https://news.ycombinator.com"],
    "userAgent": "Googlebot",
    "checkPath": "/admin/",
    "maxUrls": 100,
    "proxyConfiguration": { "useApifyProxy": true }
}

What data does this actor extract?

The actor stores one result per URL in the Apify dataset. Each entry contains:

{
    "url": "https://apify.com",
    "robotsTxtUrl": "https://apify.com/robots.txt",
    "httpStatus": 200,
    "isAccessible": true,
    "rawContent": "User-agent: *\nDisallow: /api/\nSitemap: https://apify.com/sitemap.xml",
    "userAgentsFound": ["*", "Googlebot"],
    "sitemapUrls": ["https://apify.com/sitemap.xml"],
    "crawlDelay": null,
    "disallowedPaths": ["/api/"],
    "allowedPaths": [],
    "checkedUserAgent": "Googlebot",
    "checkedPath": "/admin/",
    "isPathBlocked": true,
    "matchingRule": "Disallow: /admin/",
    "error": null,
    "scrapedAt": "2025-03-08T12:00:00+00:00"
}

Field	Type	Description
`url`	string	Original website URL
`robotsTxtUrl`	string	URL of the fetched robots.txt file
`httpStatus`	integer	HTTP status code returned for the robots.txt request
`isAccessible`	boolean	Whether robots.txt was found and returned HTTP 200
`rawContent`	string	Full raw text of the robots.txt file
`userAgentsFound`	array	All user agents declared in the file
`sitemapUrls`	array	Sitemap URLs declared in the file
`crawlDelay`	number	Crawl delay in seconds for the checked user agent, if declared
`disallowedPaths`	array	Paths disallowed for the checked user agent
`allowedPaths`	array	Paths explicitly allowed for the checked user agent
`checkedUserAgent`	string	User agent checked against the robots.txt rules
`checkedPath`	string	Specific path checked for access, if provided
`isPathBlocked`	boolean	Whether the checked path is blocked. Null if no path was provided.
`matchingRule`	string	The specific rule that determined the access result
`error`	string	Error message if the fetch failed
`scrapedAt`	string	ISO 8601 timestamp of the analysis

How it works

The actor reads the input URL or list of URLs
For each domain, it builds the robots.txt URL by appending /robots.txt to the root
It fetches the file with an HTTP GET request
The parser groups directives by user agent, reading each line in order
It matches the configured user agent against the parsed groups, checking for an exact match first and falling back to the wildcard * group
If a check path is provided, it applies the longest-match rule to determine access
All results are pushed to the Apify dataset

Integrations

Connect Robots.txt Analyzer with other apps and services using Apify integrations. You can pipe results to Google Sheets, Airtable, or trigger Slack alerts via Make or Zapier whenever a path becomes blocked. You can also use webhooks to act on results as soon as a run finishes.

FAQ

Does this actor handle robots.txt files with multiple user-agent groups?

Yes. The parser reads every user-agent block and applies the correct rules for the configured user agent, with automatic fallback to the wildcard * group when no exact match is found.

What happens if a site has no robots.txt file?

The actor records an HTTP 404 status and sets isAccessible to false. A missing robots.txt means no restrictions, so isPathBlocked is set to false when a check path is provided.

How many URLs can I process per run?

Up to 1000 per run, controlled by the maxUrls input. The default is 100 to avoid accidental large runs on first use.

Can this actor check if Googlebot can access a specific page?

Yes. Set userAgent to Googlebot and checkPath to the path you want to check. The output includes isPathBlocked and matchingRule showing exactly which directive made the decision.

Does it handle robots.txt with wildcard path patterns like * and $?

The actor handles standard robots.txt directives: Disallow, Allow, Crawl-delay, and Sitemap. Wildcard characters within path patterns (* and $ mid-path) are not currently supported; only prefix matching is applied.

Use Robots.txt Analyzer for single-site spot checks or scheduled bulk audits across hundreds of domains. Export to Google Sheets and plug into your existing SEO workflow through the Apify platform.

robots.txt Analyzer

eliai/robots-txt-analyzer

Anthony Snider

Robots.txt Auditor

junipr/robots-txt-auditor

Fetch and audit robots.txt syntax, user-agent rules, blocked paths, sitemap declarations, and crawl risks.

junipr

Robots.txt Analyzer

mahogany_songbird/robots-txt-analyzer

Read robots.txt disallow rules and sitemap declarations.

Britton Furness

Robots.txt & Sitemap Analyzer

automation-lab/robots-sitemap-analyzer

This actor fetches and parses robots.txt and sitemap.xml files for any list of websites. It extracts crawl directives (user-agent rules, allowed/disallowed paths, crawl-delay), discovers sitemap URLs, and counts the number of pages listed in each sitemap. Use it for SEO audits, competitive...

Stas Persiianenko

Robots.txt & Sitemap Discovery API

213x/robots-sitemap-discovery

Discover robots.txt rules, sitemap URLs, crawl delays, and basic crawlability signals for domains or website URLs.

Robots.txt Checker & Parser - Crawl Rules API

pink_comic/robots-txt-validator

Check, parse, and validate robots.txt files in bulk. Extract crawl rules, sitemaps, crawl-delay, blocked paths, and per-user-agent allow/disallow results for SEO audits and crawler compliance.

Ava Torres

Robots Sitemap Analyzer - SEO Crawl Rules

benthepythondev/robots-sitemap-analyzer

Analyze robots.txt files and discover sitemap URLs, user-agent groups, allow rules, disallow rules and crawl-delay directives.

Ben

Robots.txt Validator - Check Rules, Sitemaps & Crawl Directives

scrappy_garden/robots-txt-validator

Validate robots.txt for one or more websites: fetches /robots.txt per host, parses directive groups (User-agent/Allow/Disallow/Crawl-delay/Sitemap), reports common errors and warnings, and can test URLs against the chosen User-Agent.

Bikram Adhikari

Robots.txt & Sitemap Auditor

phoenix2810/robots-sitemap-auditor

Audit a website's robots.txt directives and sitemap discovery in one fast API call. Detect blocked paths, missing sitemaps, and crawlability issues. Built for SEO agencies, technical SEO consultants, and site migration QA teams.