Pricing

from $5.00 / 1,000 useful crawl scope results

Public Sitemap & Crawl Scope Planner

Turn public robots.txt and sitemap XML into crawl-scope briefs with URL inventory, path groups, seed URLs, freshness, diagnostics, and useful-result pricing.

Pricing

from $5.00 / 1,000 useful crawl scope results

Rating

0.0

(0)

Developer

jack su

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

What It Returns

Sanitized site origin and robots URL
Same-site sitemap files read
Declared sitemap URL count
Sampled public URLs with lastmod, changefreq, priority, and page type
Page type counts
Path groups and recommended seed URLs
Latest lastmod value
Freshness summary
Robots user agents, allow/disallow samples, crawl delay, and rule count
Site index hash
Evidence URLs
Confidence, completeness, missing fields, diagnostics, and readable errors

Pricing Design

The intended pay-per-event setup is:

apify-actor-start: a tiny run-start fee
useful-crawl-scope-result: charged only for useful public crawl-scope records with at least one same-site URL entry
no apify-default-dataset-item

Robots-only, missing-sitemap, duplicate, private-network, invalid-input, and failed records should not charge the useful crawl-scope event.

Good Fits

Planning crawl scope before a crawl
Feeding RAG pipelines with sitemap URL candidates
Checking whether a website exposes usable sitemap metadata
SEO/content inventory triage
Building agent-friendly site maps without rendering pages

Boundaries

This Actor does not crawl arbitrary pages, log in, use cookies, render JavaScript, take screenshots, scrape search engines, scrape social platforms, or enrich private persons. It reads only public same-site robots.txt and sitemap resources. Credentials, query parameters, fragments, private-network addresses, localhost, .local, external sitemap files, and external page URLs are rejected, skipped, or safely redacted.

Sitemap URL Extractor

lnlenost/sitemap-url-extractor

Extract page URLs from robots.txt and sitemap.xml files for SEO audits, URL discovery crawl planning, and data pipelines.

Niccolò Salerno

Sitemap URL Extractor — robots.txt + sitemap.xml Crawl

v0iddo/sitemap-url-extractor

Discover every URL a site exposes via its public sitemap chain. Reads robots.txt, follows Sitemap declarations, recursively descends sitemap-index files, extracts URLs with lastmod, changefreq, priority.

vøiddo

Sitemap Generator — XML Sitemap Crawler

junipr/sitemap-generator

Crawl a website, discover indexable URLs, and generate XML sitemaps with crawl diagnostics, canonical checks, status codes, and exportable URL inventory.

junipr

Sitemap URL Intelligence

toronto_777/sitemap-url-intelligence

Discover robots.txt sitemap entries and classify public sitemap URLs by page type.

Steven Feng

Sitemap URL Extractor

automationagents/web-sitemap

Extract every URL from a website via sitemap.xml, robots.txt, or crawl discovery. Feed clean URL lists straight into your scrapers.

Alex Jordan

Robots.txt & Sitemap Analyzer

automation-lab/robots-sitemap-analyzer

This actor fetches and parses robots.txt and sitemap.xml files for any list of websites. It extracts crawl directives (user-agent rules, allowed/disallowed paths, crawl-delay), discovers sitemap URLs, and counts the number of pages listed in each sitemap. Use it for SEO audits, competitive...

Stas Persiianenko

Robots.txt & Sitemap Discovery API

213x/robots-sitemap-discovery

Discover robots.txt rules, sitemap URLs, crawl delays, and basic crawlability signals for domains or website URLs.

Sitemap Robots Delta Monitor

tom_the_builder/sitemap-robots-delta-monitor

Monitor sitemap.xml and robots.txt for URL inventory changes and return new, changed, or removed URLs in normalized JSON.

Danil Iarmolchik

Sitemap Sniffer

maximedupre/sitemap-sniffer

Find sitemap files from website roots, domains, robots.txt, and direct sitemap URLs. Export sitemap metadata, URL counts, nested index depth, and optional URL inventory rows.

Maxime Dupré

Sitemap URL Discovery (sitemap.xml + robots.txt → all URLs)

gochujang/sitemap-url-discovery

Given a domain, finds sitemap.xml / sitemap_index.xml (also via robots.txt), recursively expands sitemap indexes, returns one row per discovered URL with lastmod / changefreq / priority. SEO audits, crawl-target prep, content cataloging. $0.0001/URL + $0.01 site fee.