XML Sitemap URL Extractor
Pricing
from $1.00 / 1,000 url extracteds
XML Sitemap URL Extractor
Recursively crawl and extract every single URL from a website’s sitemap.xml. Automate your SEO audits and scraping queues.
Pricing
from $1.00 / 1,000 url extracteds
Rating
0.0
(0)
Developer

Andok
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
15 hours ago
Last modified
Categories
Share
Extract every URL from XML sitemaps and sitemap indexes in bulk for site audits, migrations, and SEO analysis. Missing or outdated sitemap entries silently kill organic traffic — this actor gives you a complete URL inventory in seconds. Handles recursive sitemap indexes, processes thousands of URLs per run, and outputs structured data ready for spreadsheets or downstream automation.
Features
- Recursive index traversal — automatically follows sitemap index files up to a configurable depth
- Bulk processing — extract URLs from multiple sitemaps in a single run
- Rich metadata — captures
lastmod,changefreq, andpriorityfor every URL - Source tracking — each URL records which sitemap file it came from
- Configurable limits — set max URLs and max depth to control run scope and cost
- Error handling — reports fetch failures per sitemap without stopping the entire run
- Pay-per-result — only charged for URLs actually extracted
Input
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
urls | array | Yes | — | List of sitemap or sitemap index URLs to extract (e.g. https://example.com/sitemap.xml) |
maxUrls | integer | No | 1000 | Maximum number of individual URLs to extract per top-level sitemap |
maxDepth | integer | No | 2 | How many levels deep to follow sitemap index links (1 = no recursion) |
timeoutSeconds | integer | No | 15 | HTTP timeout in seconds for each XML file download |
Input Example
{"urls": ["https://apify.com/sitemap.xml"],"maxUrls": 5000,"maxDepth": 3,"timeoutSeconds": 20}
Output
Each extracted URL produces one dataset item containing the page URL, its source sitemap, and any available SEO metadata.
sourceSitemapUrl(string) — the sitemap file this URL was found inloc(string) — the extracted page URLlastmod(string | null) — last modification date from the sitemapchangefreq(string | null) — change frequency hint (daily, weekly, etc.)priority(string | null) — priority value (0.0 to 1.0)
Output Example
{"sourceSitemapUrl": "https://apify.com/sitemap.xml","loc": "https://apify.com/store","lastmod": "2025-11-20T08:30:00Z","changefreq": "weekly","priority": "0.8"}
Pricing
| Event | Cost |
|---|---|
| URL Extracted | $0.001 per URL |
You are charged per URL extracted to the dataset. Platform usage fees apply separately.
Use Cases
- Site migration audits — dump every URL from your current sitemap before migrating to verify nothing gets lost
- SEO coverage checks — compare sitemap URLs against your CMS to find pages missing from the sitemap
- Competitor analysis — extract a competitor's full URL structure from their public sitemaps
- Content inventory — build a master spreadsheet of all pages with last-modified dates for content planning
- Automated monitoring — schedule runs to track sitemap growth and detect removed pages over time
Related Actors
| Actor | What it adds |
|---|---|
| Robots.txt Auditor | Audit robots.txt crawl rules and discover sitemap URLs declared there |
| Broken Links Checker | Verify that the URLs in your sitemap actually return 200 OK |
| Hreflang Checker | Validate hreflang tags on the multilingual URLs found in your sitemap |