Robots Txt Analyzer avatar

Robots Txt Analyzer

Pricing

$2.99/month + usage

Go to Apify Store
Robots Txt Analyzer

Robots Txt Analyzer

Robots txt analyzer that fetches and parses crawl rules from any website in bulk, so SEO teams and developers can audit blocked paths, user agents, and sitemap locations across hundreds of domains without manual work.

Pricing

$2.99/month + usage

Rating

0.0

(0)

Developer

ZeroBreak

ZeroBreak

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Categories

Share

Robots.txt Analyzer: Parse and Validate Crawl Rules for Any Website

Robots.txt Analyzer fetches and parses robots.txt files for any website. Give it one domain or a list of hundreds and get back every directive: blocked paths, allowed paths, crawl delays, and sitemap URLs, organized by user agent. Most tools check robots.txt for a single site; this actor handles bulk analysis, so you can audit dozens of domains in a single run.

Use cases

  • SEO auditing: check which pages Googlebot or Bingbot can access before pushing new content live
  • Technical SEO review: audit robots.txt across dozens of client domains without opening each file manually
  • Bot access testing: verify whether a specific URL path is blocked for any crawler before deployment
  • Competitive analysis: compare robots.txt configurations across competitor domains to see what they protect from indexing
  • Site monitoring: schedule regular runs to catch unexpected changes that could block search engine crawlers
  • QA validation: confirm that robots.txt deployments match intended crawl rules after each release

Input

ParameterTypeDefaultDescription
urlstringSingle website URL to analyze
urlsarrayList of website URLs for bulk analysis. One URL per line.
userAgentstring*Crawler user agent to check rules for (e.g. Googlebot, Bingbot, *)
checkPathstringSpecific URL path to check (e.g. /admin/)
maxUrlsinteger100Maximum number of URLs to process per run
timeoutSecsinteger300Overall actor timeout in seconds
requestTimeoutSecsinteger30Per-request timeout in seconds
proxyConfigurationobjectDatacenter (Anywhere)Proxy type and location for requests. Supports Datacenter, Residential, Special, and custom proxies. Optional.

Example input

{
"urls": ["https://apify.com", "https://news.ycombinator.com"],
"userAgent": "Googlebot",
"checkPath": "/admin/",
"maxUrls": 100,
"proxyConfiguration": { "useApifyProxy": true }
}

What data does this actor extract?

The actor stores one result per URL in the Apify dataset. Each entry contains:

{
"url": "https://apify.com",
"robotsTxtUrl": "https://apify.com/robots.txt",
"httpStatus": 200,
"isAccessible": true,
"rawContent": "User-agent: *\nDisallow: /api/\nSitemap: https://apify.com/sitemap.xml",
"userAgentsFound": ["*", "Googlebot"],
"sitemapUrls": ["https://apify.com/sitemap.xml"],
"crawlDelay": null,
"disallowedPaths": ["/api/"],
"allowedPaths": [],
"checkedUserAgent": "Googlebot",
"checkedPath": "/admin/",
"isPathBlocked": true,
"matchingRule": "Disallow: /admin/",
"error": null,
"scrapedAt": "2025-03-08T12:00:00+00:00"
}
FieldTypeDescription
urlstringOriginal website URL
robotsTxtUrlstringURL of the fetched robots.txt file
httpStatusintegerHTTP status code returned for the robots.txt request
isAccessiblebooleanWhether robots.txt was found and returned HTTP 200
rawContentstringFull raw text of the robots.txt file
userAgentsFoundarrayAll user agents declared in the file
sitemapUrlsarraySitemap URLs declared in the file
crawlDelaynumberCrawl delay in seconds for the checked user agent, if declared
disallowedPathsarrayPaths disallowed for the checked user agent
allowedPathsarrayPaths explicitly allowed for the checked user agent
checkedUserAgentstringUser agent checked against the robots.txt rules
checkedPathstringSpecific path checked for access, if provided
isPathBlockedbooleanWhether the checked path is blocked. Null if no path was provided.
matchingRulestringThe specific rule that determined the access result
errorstringError message if the fetch failed
scrapedAtstringISO 8601 timestamp of the analysis

How it works

  1. The actor reads the input URL or list of URLs
  2. For each domain, it builds the robots.txt URL by appending /robots.txt to the root
  3. It fetches the file with an HTTP GET request
  4. The parser groups directives by user agent, reading each line in order
  5. It matches the configured user agent against the parsed groups, checking for an exact match first and falling back to the wildcard * group
  6. If a check path is provided, it applies the longest-match rule to determine access
  7. All results are pushed to the Apify dataset

Integrations

Connect Robots.txt Analyzer with other apps and services using Apify integrations. You can pipe results to Google Sheets, Airtable, or trigger Slack alerts via Make or Zapier whenever a path becomes blocked. You can also use webhooks to act on results as soon as a run finishes.

FAQ

Does this actor handle robots.txt files with multiple user-agent groups?

Yes. The parser reads every user-agent block and applies the correct rules for the configured user agent, with automatic fallback to the wildcard * group when no exact match is found.

What happens if a site has no robots.txt file?

The actor records an HTTP 404 status and sets isAccessible to false. A missing robots.txt means no restrictions, so isPathBlocked is set to false when a check path is provided.

How many URLs can I process per run?

Up to 1000 per run, controlled by the maxUrls input. The default is 100 to avoid accidental large runs on first use.

Can this actor check if Googlebot can access a specific page?

Yes. Set userAgent to Googlebot and checkPath to the path you want to check. The output includes isPathBlocked and matchingRule showing exactly which directive made the decision.

Does it handle robots.txt with wildcard path patterns like * and $?

The actor handles standard robots.txt directives: Disallow, Allow, Crawl-delay, and Sitemap. Wildcard characters within path patterns (* and $ mid-path) are not currently supported; only prefix matching is applied.

Use Robots.txt Analyzer for single-site spot checks or scheduled bulk audits across hundreds of domains. Export to Google Sheets and plug into your existing SEO workflow through the Apify platform.