Sitemap & URL Extractor — Get Every URL of a Website avatar

Sitemap & URL Extractor — Get Every URL of a Website

Pricing

Pay per usage

Go to Apify Store
Sitemap & URL Extractor — Get Every URL of a Website

Sitemap & URL Extractor — Get Every URL of a Website

Get every URL of a website: parses sitemap.xml and sitemap-indexes (discovered via robots.txt or the default location), with a same-site crawl fallback when there's no sitemap. Returns each URL + lastmod. No API key.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Daniel Brenner

Daniel Brenner

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

Free. Give it a website (or a sitemap URL) and get back every URL on the site — parsed from sitemap.xml and sitemap-indexes (auto-discovered via robots.txt and the default location), with a same-site crawl fallback when a site has no sitemap. No API key.

Perfect for feeding an LLM/RAG pipeline (find every page to ingest), site audits, migrations, link checking, and SEO.

What you get (per URL)

  • url — the page URL (absolute, deduped)
  • lastmod — last-modified date from the sitemap, when present (honest-null otherwise)
  • source"sitemap" or "crawl" (how the URL was found)
  • discoveredAt

How to use it

{ "startUrls": ["https://example.com"], "maxResults": 5000 }

Pass a site URL (the sitemap is found automatically) or a direct sitemap URL. It handles sitemap-indexes (sites that split their sitemap into many files) by following each child sitemap, and if there's no sitemap at all it falls back to a polite, same-site crawl. It respects robots.txt, identifies itself, and fetches one request at a time.

Pair it: discover → extract → audit

This is the discover step of a clean "feed-your-AI" toolkit by dataquarry:

  1. Discoverthis actor: every URL of a site.
  2. Extractdataquarry/website-to-markdown: turn those URLs into clean, LLM-ready Markdown.
  3. Auditdataquarry/website-seo-metadata-checker: SEO & metadata for each page.

Also see the dataquarry OSM place-data scrapers and free guides at openplacedata.com.

Clean & honest

Reads only public sitemap.xml/robots.txt and (in fallback) public pages; respects robots.txt; sends a descriptive User-Agent; no logins, no PII. Missing values are null, never guessed.

FAQ

Do I need an API key? No — give it a URL and run it. It's free.

What if the site has no sitemap? It crawls the site's own links (same-domain, bounded) so you still get a URL list.

Does it handle huge sitemap-indexes? Yes — it follows child sitemaps up to the maxSitemaps and maxResults caps you set.