Pricing

from $0.30 / 1,000 url extracteds

Sitemap URL Extractor

Extract every URL from a website's sitemap.xml. Recursively walks nested sitemap indexes and returns loc, lastmod, changefreq, and priority for each page.

Pricing

from $0.30 / 1,000 url extracteds

Rating

0.0

(0)

Developer

Andrew

Actor stats

Bookmarked

Total users

Monthly active users

12 days ago

Last modified

What you get

URL (loc) for every page listed in the site's sitemap
Last modified date (lastmod) - when each page was last updated
Change frequency (changefreq) - always, hourly, daily, weekly, monthly, yearly, never
Priority (priority) - relative importance of each URL (0.0 - 1.0)
Source sitemap - which sitemap file the URL came from (useful when a site splits its sitemap by section)
Auto-discovery - point at a homepage and the actor finds the sitemap via robots.txt or /sitemap.xml
Gzipped sitemap support - handles .xml.gz files transparently
Recursive sitemap index walking - follows nested <sitemapindex> files up to 5 levels deep

Use cases

SEO audits - pull a full URL inventory before running site-wide checks (broken links, missing meta tags, schema validation)
Content migration - build a complete URL list when moving a site between platforms
Crawl budget planning - see how many URLs a site exposes and how recently each was updated
Competitor research - map out every page a competitor publishes
Sitemap validation - verify that your published sitemap actually contains the pages you expect
Bulk URL scraping pipelines - feed the output into another actor for screenshots, content extraction, or AI summarization

How to use

Enter a Website or Sitemap URL - either a homepage like https://www.example.com (the actor auto-discovers the sitemap) or a direct sitemap URL like https://www.example.com/sitemap.xml
Set Max Items - 0 returns every URL in the entire sitemap tree
Choose whether to Follow Sitemap Index - on by default, so a single run pulls every URL from every child sitemap
Run the actor - results land in the Dataset tab
Export to JSON, CSV, Excel, or Google Sheets directly from the Apify console

Extract every URL on a site

{
  "websiteUrl": "https://www.apify.com",
  "maxItems": 0,
  "followSitemapIndex": true
}

Extract only the top-level sitemap

{
  "websiteUrl": "https://www.apify.com/sitemap.xml",
  "maxItems": 0,
  "followSitemapIndex": false
}

Output format

One dataset record per URL:

{
  "loc": "https://www.apify.com/store",
  "lastmod": "2024-08-12",
  "changefreq": "daily",
  "priority": "0.8",
  "sourceSitemap": "https://www.apify.com/sitemap.xml"
}

Fields not present in the sitemap entry come back as null.

Parameters

Field	Default	Description
Website or Sitemap URL	`https://www.apify.com`	Homepage URL (auto-discovered) or direct `.xml` / `.xml.gz` sitemap URL
Max Items	`0`	Maximum URLs to return per run. `0` = unlimited
Follow Sitemap Index	`true`	Recurse into child sitemaps when the top-level file is a sitemap index

Notes

Sitemap discovery first looks for Sitemap: directives in /robots.txt, then falls back to /sitemap.xml
Nested sitemap indexes are walked breadth-first; the actor de-duplicates sitemap URLs so circular references are safe
Recursion is capped at 5 levels deep and 1,000 total sitemaps as a safety net against runaway loops
Each fetched sitemap has a 30-second timeout - slow or unreachable child sitemaps are logged and skipped, the run continues
Gzip-compressed sitemaps (*.xml.gz) are decompressed automatically

Part of a complete website & SEO toolkit - explore the rest of the suite:

Website Contact Scraper - Emails, phones, and socials from any website
Website Email Scraper - Crawl a site deep and extract all emails
Website Tech Stack Detector - Detect CMS, frameworks, analytics, and DNS/MX
SEO Meta Tag Auditor - Audit title, OG, Twitter cards, and schema
Domain WHOIS & SSL Inspector - WHOIS, domain age, and live SSL details

Sitemap URL Extractor - XML Sitemap Scraper

benthepythondev/sitemap-url-extractor

Extract URLs from XML sitemaps and sitemap indexes. Get URL, lastmod, changefreq, priority and source sitemap.

Ben

Sitemap Extractor: Website → All URLs (sitemap.xml parser)

boxbox10/sitemap-extractor

Give it a website. Get every URL from its sitemap — loc, lastmod, changefreq, priority — as one clean record per URL. Auto-discovers sitemap.xml, robots.txt Sitemap: directives, and nested sitemap indexes. Perfect for SEO audits, crawl seeding, and URL discovery.

Marvin Eguilos

Sitemap URL Extractor

blazing_stake/sitemap-url-extractor

Extract every URL from any website's sitemap, including nested sitemap indexes (recursive). Auto-discovers sitemaps from robots.txt. Returns URLs with lastmod, changefreq, priority.

Mehmet Kut

Sitemap URL Extractor - Get Every URL from sitemap.xml

eliai/sitemap-url-extractor

Extract every URL from any sitemap.xml, auto-following nested sitemap index files. Input: startUrls (sitemap URL). Output: JSON records with loc, lastmod, changefreq, priority, sourceSitemap. Cheap pay-per-result: $0.02 per sitemap parsed.

Anthony Snider

Sitemap URL Extractor - List All URLs in a Sitemap

dltik/sitemap-url-extractor

Extract every URL from any XML sitemap, with lastmod, changefreq and priority. Resolves sitemap indexes recursively. Pass a sitemap.xml or just a site root to auto-discover its sitemaps. Pure HTTP, no browser — fast and cheap.

Walid

Sitemap to URL List Extractor

scrapeworks/sitemap-to-urls

Extract every URL from any website's sitemap as clean JSON. Handles sitemap indexes (recursive) and gzipped sitemaps automatically. Includes lastmod, priority, and changefreq.

Nicolas van Arkens

Sitemap URL Extractor

mikolabs/sitemap-url-extractor

Extract every URL and its metadata from any sitemap.xml in seconds. Paste one or more sitemap URLs, run the Actor, and get a clean, structured dataset with url, lastmod, changefreq, priority, and more — ready to export as CSV, JSON, or Excel.

mikolabs

Sitemap URL Extractor — robots.txt + sitemap.xml Crawl

v0iddo/sitemap-url-extractor

Discover every URL a site exposes via its public sitemap chain. Reads robots.txt, follows Sitemap declarations, recursively descends sitemap-index files, extracts URLs with lastmod, changefreq, priority.

vøiddo

Sitemap & URL Extractor — Get Every URL of a Website

dataquarry/sitemap-url-extractor

Get every URL of a website: parses sitemap.xml and sitemap-indexes (discovered via robots.txt or the default location), with a same-site crawl fallback when there's no sitemap. Returns each URL + lastmod. No API key.

Daniel Brenner

Sitemap URL Extractor

cool_ya/sitemap-url-extractor

Discover and parse XML sitemaps for any website. Returns every URL with lastmod, changefreq and priority. Handles sitemap indexes, gzipped and plain-text sitemaps.