Sitemap to URL List Extractor
Pricing
from $1.00 / 1,000 results
Sitemap to URL List Extractor
Extract every URL from any website's sitemap as clean JSON. Handles sitemap indexes (recursive) and gzipped sitemaps automatically. Includes lastmod, priority, and changefreq.
Pricing
from $1.00 / 1,000 results
Rating
0.0
(0)
Developer
Nicolas van Arkens
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Sitemap to URL List Extractor πΊοΈ
Extract every URL from any website's sitemap as clean structured JSON, with last-modified dates, priority, and change frequency. Handles sitemap indexes (recursing into nested sitemaps) and gzipped sitemaps (.xml.gz) automatically β the parts most lazy sitemap scrapers get wrong.
Perfect for SEO audits, content inventories, building crawl lists, monitoring site changes, and feeding URL lists into other tools.
Why use it
- π Recursive sitemap indexes β point it at
sitemap_index.xmland it'll follow every nested sitemap automatically - π¦ Gzipped sitemaps handled β many large sites ship
.xml.gz, decompressed transparently - ποΈ Full metadata β
lastmod,priority,changefreq, plus image URLs from the Google image sitemap extension - π‘οΈ Guardrails β configurable caps on total URLs and nested sitemaps so a giant site can't run away
- π Works on any website β news sites, e-commerce stores, blogs, documentation, public web apps
Use cases
- SEO audits β list every indexed URL on a site, sort by lastmod, find stale content
- Crawl seed lists β generate URL lists for downstream scrapers or archival tools
- Content inventories β see what a competitor or partner site is actually publishing
- Change monitoring β schedule it and detect newly added pages
- Site migrations β get the full URL set before redirect mapping
Input
| Field | Description |
|---|---|
| Sitemap URLs | One or more sitemap URLs (sitemap.xml, sitemap_index.xml, or .xml.gz). |
| Follow indexes | If a URL points to an index, recurse and process each nested sitemap. |
| Maximum URLs | Total URL cap across all sitemaps. |
| Maximum nested sitemaps | Cap on number of sitemap files fetched when following indexes. |
Output
{"url": "https://example.com/page1","lastModified": "2025-05-10","changeFrequency": "weekly","priority": "0.8","images": ["https://example.com/img1.jpg"],"sourceSitemap": "https://example.com/sitemap-products.xml","sourceRoot": "https://example.com/sitemap_index.xml"}
Export to JSON, CSV, or Excel, or pull via the Apify API.
Notes
- Supports the standard sitemaps.org protocol, sitemap index files, and the Google image-sitemap extension.
- Always respects each site's
robots.txtpolicy on access β please use responsibly. - Independent tool; sitemaps remain the property of their publishers.