Sitemap to URL List Extractor avatar

Sitemap to URL List Extractor

Pricing

from $1.00 / 1,000 results

Go to Apify Store
Sitemap to URL List Extractor

Sitemap to URL List Extractor

Extract every URL from any website's sitemap as clean JSON. Handles sitemap indexes (recursive) and gzipped sitemaps automatically. Includes lastmod, priority, and changefreq.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

Nicolas van Arkens

Nicolas van Arkens

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

Sitemap to URL List Extractor πŸ—ΊοΈ

Extract every URL from any website's sitemap as clean structured JSON, with last-modified dates, priority, and change frequency. Handles sitemap indexes (recursing into nested sitemaps) and gzipped sitemaps (.xml.gz) automatically β€” the parts most lazy sitemap scrapers get wrong.

Perfect for SEO audits, content inventories, building crawl lists, monitoring site changes, and feeding URL lists into other tools.

Why use it

  • πŸ” Recursive sitemap indexes β€” point it at sitemap_index.xml and it'll follow every nested sitemap automatically
  • πŸ“¦ Gzipped sitemaps handled β€” many large sites ship .xml.gz, decompressed transparently
  • πŸ—“οΈ Full metadata β€” lastmod, priority, changefreq, plus image URLs from the Google image sitemap extension
  • πŸ›‘οΈ Guardrails β€” configurable caps on total URLs and nested sitemaps so a giant site can't run away
  • 🌐 Works on any website β€” news sites, e-commerce stores, blogs, documentation, public web apps

Use cases

  • SEO audits β€” list every indexed URL on a site, sort by lastmod, find stale content
  • Crawl seed lists β€” generate URL lists for downstream scrapers or archival tools
  • Content inventories β€” see what a competitor or partner site is actually publishing
  • Change monitoring β€” schedule it and detect newly added pages
  • Site migrations β€” get the full URL set before redirect mapping

Input

FieldDescription
Sitemap URLsOne or more sitemap URLs (sitemap.xml, sitemap_index.xml, or .xml.gz).
Follow indexesIf a URL points to an index, recurse and process each nested sitemap.
Maximum URLsTotal URL cap across all sitemaps.
Maximum nested sitemapsCap on number of sitemap files fetched when following indexes.

Output

{
"url": "https://example.com/page1",
"lastModified": "2025-05-10",
"changeFrequency": "weekly",
"priority": "0.8",
"images": ["https://example.com/img1.jpg"],
"sourceSitemap": "https://example.com/sitemap-products.xml",
"sourceRoot": "https://example.com/sitemap_index.xml"
}

Export to JSON, CSV, or Excel, or pull via the Apify API.

Notes

  • Supports the standard sitemaps.org protocol, sitemap index files, and the Google image-sitemap extension.
  • Always respects each site's robots.txt policy on access β€” please use responsibly.
  • Independent tool; sitemaps remain the property of their publishers.