Public Sitemap & Crawl Scope Planner
Pricing
from $5.00 / 1,000 useful crawl scope results
Public Sitemap & Crawl Scope Planner
Turn public robots.txt and sitemap XML into crawl-scope briefs with URL inventory, path groups, seed URLs, freshness, diagnostics, and useful-result pricing.
Pricing
from $5.00 / 1,000 useful crawl scope results
Rating
0.0
(0)
Developer
jack su
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
8 days ago
Last modified
Categories
Share
Turn public robots.txt and sitemap XML into a bounded, AI-ready site index:
sitemap URLs, sampled declared pages, page-type counts, path groups,
recommended seed URLs, lastmod freshness, robots rules, source evidence,
confidence, and diagnostics.
This Actor is for SEO, RAG preparation, AI-agent browsing, monitoring, and content operations teams that need a predictable URL inventory before deciding whether to crawl or process a website.
What It Returns
- Sanitized site origin and robots URL
- Same-site sitemap files read
- Declared sitemap URL count
- Sampled public URLs with lastmod, changefreq, priority, and page type
- Page type counts
- Path groups and recommended seed URLs
- Latest lastmod value
- Freshness summary
- Robots user agents, allow/disallow samples, crawl delay, and rule count
- Site index hash
- Evidence URLs
- Confidence, completeness, missing fields, diagnostics, and readable errors
Pricing Design
The intended pay-per-event setup is:
apify-actor-start: a tiny run-start feeuseful-crawl-scope-result: charged only for useful public crawl-scope records with at least one same-site URL entry- no
apify-default-dataset-item
Robots-only, missing-sitemap, duplicate, private-network, invalid-input, and failed records should not charge the useful crawl-scope event.
Good Fits
- Planning crawl scope before a crawl
- Feeding RAG pipelines with sitemap URL candidates
- Checking whether a website exposes usable sitemap metadata
- SEO/content inventory triage
- Building agent-friendly site maps without rendering pages
Boundaries
This Actor does not crawl arbitrary pages, log in, use cookies, render
JavaScript, take screenshots, scrape search engines, scrape social platforms,
or enrich private persons. It reads only public same-site robots.txt and
sitemap resources. Credentials, query parameters, fragments, private-network
addresses, localhost, .local, external sitemap files, and external page URLs
are rejected, skipped, or safely redacted.