Pricing

Pay per usage

Sitemap URL Extractor — robots.txt + sitemap.xml Crawl

Discover every URL a site exposes via its public sitemap chain. Reads robots.txt, follows Sitemap declarations, recursively descends sitemap-index files, extracts URLs with lastmod, changefreq, priority.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

vøiddo

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

Example output row

{
  "domain": "vercel.com",
  "url": "https://vercel.com/blog/nextjs-14",
  "lastmod": "2024-03-15",
  "changefreq": "weekly",
  "priority": 0.8,
  "source": "https://vercel.com/sitemap-blog.xml"
}

How to use

Input

Field	Type	Default	Description
`domains`	`string[]`	`["stripe.com","shopify.com","vercel.com"]`	Domains to crawl — no scheme, no trailing slash
`maxUrlsPerDomain`	`integer`	`2000`	Hard cap on URLs returned per domain
`followSitemapIndex`	`boolean`	`true`	Recursively follow `<sitemapindex>` child links (up to depth 5)

Minimal run

{
  "domains": ["example.com"],
  "maxUrlsPerDomain": 500,
  "followSitemapIndex": true
}

Output fields

Field	Type	Notes
`domain`	string	Input domain
`url`	string	Discovered URL from `<loc>`
`lastmod`	string	ISO date, `null` if absent
`changefreq`	string	e.g. `weekly`, `null` if absent
`priority`	float	0.0–1.0, `null` if absent
`source`	string	Sitemap file the URL was found in

Pricing

Event	Cost	When charged
`url_extracted`	$0.0001 per URL	Once per run, total = URLs pushed

A 2 000-URL run costs $0.20. Unused budget is not charged — if a domain has only 300 URLs you pay for 300.

Buyer

SEO teams auditing crawl coverage — verify every page is in the sitemap.
Content operations checking lastmod staleness across thousands of URLs.
Competitive intelligence — map a competitor's full URL structure.
QA pipelines validating sitemap health after deploys.
Link-building researchers finding indexable pages at scale.

Source

Crawl order per domain:

GET https://{domain}/robots.txt — parse all Sitemap: lines.
If none found, fall back to GET https://{domain}/sitemap.xml.
For each sitemap URL: fetch + parse XML.
If <sitemapindex>, enqueue each <sitemap><loc> (up to depth 5).
If <urlset>, emit one row per <url> until maxUrlsPerDomain is reached.

All requests use a polite User-Agent and are paced at 250–600 ms between calls. 404 and empty responses are skipped gracefully.

Sitemap Extractor: Website → All URLs (sitemap.xml parser)

boxbox10/sitemap-extractor

Give it a website. Get every URL from its sitemap — loc, lastmod, changefreq, priority — as one clean record per URL. Auto-discovers sitemap.xml, robots.txt Sitemap: directives, and nested sitemap indexes. Perfect for SEO audits, crawl seeding, and URL discovery.

Marvin Eguilos

Sitemap URL Extractor: Every URL, Recursive

thoob/sitemap-extractor

Reads sitemap.xml, sitemap index files, .gz compressed sitemaps, and robots.txt Sitemap directives, and returns one clean row per URL with lastmod, changefreq, and priority. Billed only per delivered URL.

Pono Data

Sitemap URL Extractor - XML Sitemap Scraper

benthepythondev/sitemap-url-extractor

Extract URLs from XML sitemaps and sitemap indexes. Get URL, lastmod, changefreq, priority and source sitemap.

Ben

Sitemap URL Extractor - List All URLs in a Sitemap

dltik/sitemap-url-extractor

Extract every URL from any XML sitemap, with lastmod, changefreq and priority. Resolves sitemap indexes recursively. Pass a sitemap.xml or just a site root to auto-discover its sitemaps. Pure HTTP, no browser — fast and cheap.

Walid

Sitemap URL Intelligence

toronto_777/sitemap-url-intelligence

Discover robots.txt sitemap entries and classify public sitemap URLs by page type.

Steven Feng

Sitemap Sniffer

crawlerbros/sitemap-sniffer

Discover every sitemap file for a website. Reads robots.txt for Sitemap directives, probes common sitemap paths, and recursively unpacks sitemap-index files. HTTP-only, no proxy or cookies needed.

Crawler Bros

Sitemap URL Extractor

blazing_stake/sitemap-url-extractor

Extract every URL from any website's sitemap, including nested sitemap indexes (recursive). Auto-discovers sitemaps from robots.txt. Returns URLs with lastmod, changefreq, priority.

Mehmet Kut

Sitemap URL Extractor - Get Every URL from sitemap.xml

eliai/sitemap-url-extractor

Extract every URL from any sitemap.xml, auto-following nested sitemap index files. Input: startUrls (sitemap URL). Output: JSON records with loc, lastmod, changefreq, priority, sourceSitemap. Cheap pay-per-result: $0.02 per sitemap parsed.

Anthony Snider

Sitemap URL Discovery (sitemap.xml + robots.txt → all URLs)

gochujang/sitemap-url-discovery

Given a domain, finds sitemap.xml / sitemap_index.xml (also via robots.txt), recursively expands sitemap indexes, returns one row per discovered URL with lastmod / changefreq / priority. SEO audits, crawl-target prep, content cataloging. $0.0001/URL + $0.01 site fee.