Sitemap & URL Extractor — Get Every URL of a Website
Pricing
Pay per usage
Sitemap & URL Extractor — Get Every URL of a Website
Get every URL of a website: parses sitemap.xml and sitemap-indexes (discovered via robots.txt or the default location), with a same-site crawl fallback when there's no sitemap. Returns each URL + lastmod. No API key.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Daniel Brenner
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Free. Give it a website (or a sitemap URL) and get back every URL on the site — parsed from sitemap.xml and sitemap-indexes (auto-discovered via robots.txt and the default location), with a same-site crawl fallback when a site has no sitemap. No API key.
Perfect for feeding an LLM/RAG pipeline (find every page to ingest), site audits, migrations, link checking, and SEO.
What you get (per URL)
url— the page URL (absolute, deduped)lastmod— last-modified date from the sitemap, when present (honest-null otherwise)source—"sitemap"or"crawl"(how the URL was found)discoveredAt
How to use it
{ "startUrls": ["https://example.com"], "maxResults": 5000 }
Pass a site URL (the sitemap is found automatically) or a direct sitemap URL. It handles sitemap-indexes (sites that split their sitemap into many files) by following each child sitemap, and if there's no sitemap at all it falls back to a polite, same-site crawl. It respects robots.txt, identifies itself, and fetches one request at a time.
Pair it: discover → extract → audit
This is the discover step of a clean "feed-your-AI" toolkit by dataquarry:
- Discover — this actor: every URL of a site.
- Extract —
dataquarry/website-to-markdown: turn those URLs into clean, LLM-ready Markdown. - Audit —
dataquarry/website-seo-metadata-checker: SEO & metadata for each page.
Also see the dataquarry OSM place-data scrapers and free guides at openplacedata.com.
Clean & honest
Reads only public sitemap.xml/robots.txt and (in fallback) public pages; respects robots.txt; sends a descriptive User-Agent; no logins, no PII. Missing values are null, never guessed.
FAQ
Do I need an API key? No — give it a URL and run it. It's free.
What if the site has no sitemap? It crawls the site's own links (same-domain, bounded) so you still get a URL list.
Does it handle huge sitemap-indexes? Yes — it follows child sitemaps up to the maxSitemaps and maxResults caps you set.