Pricing

from $1.00 / 1,000 results

robots.txt Parser & URL Tester

Fetch and parse robots.txt for any site: user-agent rules, crawl-delay, and declared sitemaps. Optionally test whether specific URLs are allowed for a given user-agent, using correct longest-match rules.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

Nicolas van Arkens

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

robots.txt Parser & URL Tester 🤖

Fetch and parse robots.txt for any site and get a clean, structured breakdown — per-user-agent allow/disallow rules, crawl-delay, and every declared sitemap. Optionally test whether specific URLs are allowed or blocked for a chosen crawler, using correct longest-match precedence.

Built for SEO audits, crawler and bot development, compliance checks, and anyone who needs to know what a site permits before crawling it.

Why use it

📋 Structured rules — allow/disallow lists per user-agent, not raw text
🤖 User-agent aware — see the rules that actually apply to Googlebot, bingbot, or *
✅ URL allow/deny testing — check exact paths against the rules with proper * wildcard, $ anchor, and longest-match logic
🐌 Crawl-delay — extracted per user-agent
🗺️ Sitemaps — every sitemap the site declares, ready to feed into a sitemap extractor
🌐 Batch — check many sites at once

Use cases

SEO audits — verify a site isn't accidentally blocking important pages
Crawler development — respect robots.txt correctly before scraping
Compliance — confirm what a site permits for your user-agent
Sitemap discovery — pull declared sitemaps to drive further crawling
Monitoring — track robots.txt changes over time

Input

Field	Description
Sites	List of sites/URLs; robots.txt is fetched at each root.
User-agent	Which crawler's rules to apply (e.g. `Googlebot`, or `*`).
Test paths	Optional paths/URLs to test for allowed/blocked.

Output

{
  "site": "https://example.com",
  "robotsUrl": "https://example.com/robots.txt",
  "success": true,
  "userAgentChecked": "*",
  "sitemaps": ["https://example.com/sitemap.xml"],
  "userAgentsDeclared": ["*", "googlebot", "badbot"],
  "appliedGroupDisallow": ["/private/", "/tmp/"],
  "appliedGroupAllow": ["/private/public-page"],
  "crawlDelay": 10,
  "testResults": [
    { "path": "/private/secret", "allowed": false },
    { "path": "/private/public-page", "allowed": true }
  ]
}

Export to JSON, CSV, or Excel, or pull via the Apify API.

Notes

Implements standard robots.txt semantics: longest-match wins between Allow and Disallow, with * wildcards and $ end-anchors (per Google's specification).
A site with no robots.txt (404) is reported as such — by convention, that means all crawling is allowed.
Independent tool. Always honor robots.txt in your own crawling.

robots.txt & Sitemap Analyzer

bgfc97/robots-txt-sitemap-analyzer

Fetch and parse robots.txt for many domains: user-agent groups, allow/disallow rules, crawl-delay, declared sitemaps and whether crawlers are blocked. Technical SEO auditing. No key.

Bruno

Robots.txt Checker & Parser - Crawl Rules API

pink_comic/robots-txt-validator

Check, parse, and validate robots.txt files in bulk. Extract crawl rules, sitemaps, crawl-delay, blocked paths, and per-user-agent allow/disallow results for SEO audits and crawler compliance.

Ava Torres

Robots.txt Validator - Check Rules, Sitemaps & Crawl Directives

scrappy_garden/robots-txt-validator

Validate robots.txt for one or more websites: fetches /robots.txt per host, parses directive groups (User-agent/Allow/Disallow/Crawl-delay/Sitemap), reports common errors and warnings, and can test URLs against the chosen User-Agent.

Bikram Adhikari

Robots Sitemap Analyzer - SEO Crawl Rules

benthepythondev/robots-sitemap-analyzer

Analyze robots.txt files and discover sitemap URLs, user-agent groups, allow rules, disallow rules and crawl-delay directives.

Ben

Robots.txt & Sitemap Analyzer 🕷️

perryay/robots-txt-sitemap-analyzer

Fetch, parse, and analyze robots.txt and sitemap.xml for any domain. Extract crawl directives, test URL compliance against robots.txt rules, and discover all URLs from sitemaps including nested sitemap indexes. Supports batch analysis with structured JSON output.

Perry AY

Robots.txt Auditor

junipr/robots-txt-auditor

Fetch and audit robots.txt syntax, user-agent rules, blocked paths, sitemap declarations, and crawl risks.

junipr

Robots.txt Analyzer

sootesting/robots-txt-analyzer

Analyze robots.txt for any list of domains — crawl rules per user-agent, declared sitemaps, crawl-delay, and indexing red flags (e.g. 'Disallow: /' blocking the whole site). One clean report per domain. Pay-per-event: $0.01 per batch of up to 200 domains.

soot

Robots.txt Analyzer

mahogany_songbird/robots-txt-analyzer

Read robots.txt disallow rules and sitemap declarations.

Britton Furness

Robots.txt & Sitemap Extractor by Domain

technicaldost/robots-sitemap-extractor

Fetch and parse robots.txt and XML sitemaps for any domain. Extract allowed and disallowed paths, sitemap URLs and every listed page. Great for SEO audits and crawl planning. JSON output.

Technical Dost Solutions

Robots.txt & Sitemap Analyzer

automation-lab/robots-sitemap-analyzer

This actor fetches and parses robots.txt and sitemap.xml files for any list of websites. It extracts crawl directives (user-agent rules, allowed/disallowed paths, crawl-delay), discovers sitemap URLs, and counts the number of pages listed in each sitemap. Use it for SEO audits, competitive...