XML Sitemap URL Extractor avatar

XML Sitemap URL Extractor

Pricing

from $1.00 / 1,000 url extracteds

Go to Apify Store
XML Sitemap URL Extractor

XML Sitemap URL Extractor

Recursively crawl and extract every single URL from a website’s sitemap.xml. Automate your SEO audits and scraping queues.

Pricing

from $1.00 / 1,000 url extracteds

Rating

0.0

(0)

Developer

Andok

Andok

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

15 hours ago

Last modified

Categories

Share

Extract every URL from XML sitemaps and sitemap indexes in bulk for site audits, migrations, and SEO analysis. Missing or outdated sitemap entries silently kill organic traffic — this actor gives you a complete URL inventory in seconds. Handles recursive sitemap indexes, processes thousands of URLs per run, and outputs structured data ready for spreadsheets or downstream automation.

Features

  • Recursive index traversal — automatically follows sitemap index files up to a configurable depth
  • Bulk processing — extract URLs from multiple sitemaps in a single run
  • Rich metadata — captures lastmod, changefreq, and priority for every URL
  • Source tracking — each URL records which sitemap file it came from
  • Configurable limits — set max URLs and max depth to control run scope and cost
  • Error handling — reports fetch failures per sitemap without stopping the entire run
  • Pay-per-result — only charged for URLs actually extracted

Input

FieldTypeRequiredDefaultDescription
urlsarrayYesList of sitemap or sitemap index URLs to extract (e.g. https://example.com/sitemap.xml)
maxUrlsintegerNo1000Maximum number of individual URLs to extract per top-level sitemap
maxDepthintegerNo2How many levels deep to follow sitemap index links (1 = no recursion)
timeoutSecondsintegerNo15HTTP timeout in seconds for each XML file download

Input Example

{
"urls": ["https://apify.com/sitemap.xml"],
"maxUrls": 5000,
"maxDepth": 3,
"timeoutSeconds": 20
}

Output

Each extracted URL produces one dataset item containing the page URL, its source sitemap, and any available SEO metadata.

  • sourceSitemapUrl (string) — the sitemap file this URL was found in
  • loc (string) — the extracted page URL
  • lastmod (string | null) — last modification date from the sitemap
  • changefreq (string | null) — change frequency hint (daily, weekly, etc.)
  • priority (string | null) — priority value (0.0 to 1.0)

Output Example

{
"sourceSitemapUrl": "https://apify.com/sitemap.xml",
"loc": "https://apify.com/store",
"lastmod": "2025-11-20T08:30:00Z",
"changefreq": "weekly",
"priority": "0.8"
}

Pricing

EventCost
URL Extracted$0.001 per URL

You are charged per URL extracted to the dataset. Platform usage fees apply separately.

Use Cases

  • Site migration audits — dump every URL from your current sitemap before migrating to verify nothing gets lost
  • SEO coverage checks — compare sitemap URLs against your CMS to find pages missing from the sitemap
  • Competitor analysis — extract a competitor's full URL structure from their public sitemaps
  • Content inventory — build a master spreadsheet of all pages with last-modified dates for content planning
  • Automated monitoring — schedule runs to track sitemap growth and detect removed pages over time
ActorWhat it adds
Robots.txt AuditorAudit robots.txt crawl rules and discover sitemap URLs declared there
Broken Links CheckerVerify that the URLs in your sitemap actually return 200 OK
Hreflang CheckerValidate hreflang tags on the multilingual URLs found in your sitemap