Sitemap URL Extractor
Pricing
from $1.50 / 1,000 results
Sitemap URL Extractor
Extract every URL and its metadata from any sitemap.xml in seconds. Paste one or more sitemap URLs, run the Actor, and get a clean, structured dataset with url, lastmod, changefreq, priority, and more — ready to export as CSV, JSON, or Excel.
Pricing
from $1.50 / 1,000 results
Rating
0.0
(0)
Developer
mikolabs
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
Sitemap URL Extractor — Bulk XML Sitemap Parser for SEO & Content Audits
Extract every URL and its metadata from any sitemap.xml in seconds. Paste one or more sitemap URLs, run the Actor, and get a clean, structured dataset with url, lastmod, changefreq, priority, and more — ready to export as CSV, JSON, or Excel.
💡 The most affordable sitemap extractor on Apify. Free plan users get 20 results free. Paying users pay just $1.50 per 1,000 results — a fraction of competing tools.
What This Actor Does
This Actor accepts one or more sitemap.xml URLs and:
- Crawls and parses all URLs from standard sitemaps (
urlset) - Automatically follows nested sitemap index files (
sitemapindex) up to a configurable depth - Extracts all standard sitemap fields:
url,lastmod,changefreq,priority - Supports image sitemap and Google News sitemap extensions
- Optionally filters results using a custom regex pattern
- Returns structured, export-ready data
Use Cases
🔍 SEO Analysis
Extract every page URL from a website's sitemap to audit indexation, spot orphaned pages, or validate that all key content is discoverable by search engines.
📋 Content Inventory
Build a complete list of all pages on a website before a migration, redesign, or CMS switch. Know exactly what exists before you move it.
🔗 Broken Link Checking
Pull all sitemap URLs and feed them into a link checker to find 404s, redirects, or server errors across your entire site.
🏆 Competitive Analysis
Discover how a competitor structures their website by parsing their public sitemap. Understand which pages they prioritize and how frequently they publish.
📰 Content Monitoring
Track lastmod dates across your sitemap over time to monitor publishing frequency and detect stale content.
How to Use
Step 1 — Open the Actor
Go to the Input tab in the Apify Console.
Step 2 — Configure Your Inputs
| Field | Description | Default |
|---|---|---|
| Sitemap URLs | One or more sitemap.xml URLs to extract from | Required |
| Max Depth | How deep to follow nested sitemapindex files | 3 |
| Request Timeout | Seconds to wait per request | 30 |
| Filter URL Pattern | Optional regex to keep only matching URLs | (none) |
| Proxy Configuration | Optional proxy for rate-limited sites | (none) |
Example sitemap URL:
https://mikolabs.xyz/sitemap.xml
You can add multiple URLs — the Actor processes them all in one run.
Step 3 — Run the Actor
Click Start and the Actor will crawl your sitemap(s), follow any nested indexes, and collect all URL records into the dataset.
Step 4 — Export Your Results
Once the run finishes, go to the Storage → Dataset tab and export your data in:
- CSV — open directly in Excel or Google Sheets
- JSON — use in APIs or pipelines
- XLSX — ready-made spreadsheet
Output Example
Each row in the output dataset represents one URL found in the sitemap:
[{"url": "https://mikolabs.xyz/","lastmod": "2026-04-17","changefreq": "monthly","priority": 1.0,"sitemapUrl": "https://mikolabs.xyz/sitemap.xml","sitemapType": "urlset","scrapedAt": "2026-04-18T10:29:37.665452+00:00"},{"url": "https://mikolabs.xyz/apis","lastmod": "2026-04-17","changefreq": "weekly","priority": 0.8,"sitemapUrl": "https://mikolabs.xyz/sitemap.xml","sitemapType": "urlset","scrapedAt": "2026-04-18T10:29:37.665452+00:00"},{"url": "https://mikolabs.xyz/pricing","lastmod": "2026-04-17","changefreq": "monthly","priority": 0.7,"sitemapUrl": "https://mikolabs.xyz/sitemap.xml","sitemapType": "urlset","scrapedAt": "2026-04-18T10:29:37.665452+00:00"}]
Output Fields
| Field | Type | Description |
|---|---|---|
url | string | The page URL from the sitemap |
lastmod | string | Last modified date (ISO 8601) |
changefreq | string | How often the page changes (daily, weekly, monthly…) |
priority | number | Page priority relative to the rest of the site (0.0–1.0) |
sitemapUrl | string | The source sitemap this URL was found in |
sitemapType | string | urlset or sitemapindex |
images | array | Image entries from image sitemap extensions (if present) |
news | object | Google News metadata (if present) |
scrapedAt | string | Timestamp of when the record was collected |
Input Reference
{"sitemapUrls": ["https://mikolabs.xyz/sitemap.xml"],"maxDepth": 3,"requestTimeoutSecs": 30,"filterUrlPattern": ""}
sitemapUrls (required)
An array of one or more sitemap.xml URLs. Accepts both standard urlset sitemaps and sitemapindex files that point to other sitemaps.
maxDepth (optional, default: 3)
Controls how many levels of nested sitemap index files the Actor will follow. Set to 1 to only parse the provided sitemaps without following any child links.
requestTimeoutSecs (optional, default: 30)
Maximum time in seconds to wait for each sitemap response. Increase this for slow servers.
filterUrlPattern (optional)
A regular expression to filter which URLs are saved to the dataset. For example, https://example\.com/blog/.* will only save blog URLs. Leave empty to collect all URLs.
proxyConfiguration (optional)
Enables Apify proxy rotation to avoid IP blocks on rate-limited websites. Not required for most public sitemaps.
Pricing
| Plan | Price |
|---|---|
| Free | 20 results free |
| Pay-as-you-go / Subscription | $1.50 per 1,000 results |
This Actor is among the most competitively priced sitemap extractors on the Apify platform — ideal for one-off audits, scheduled monitoring, and large-scale extractions alike.
Frequently Asked Questions
Does it support sitemap index files?
Yes. If your sitemap URL points to a sitemapindex (a sitemap of sitemaps), the Actor will automatically follow all child sitemap links up to the configured maxDepth.
Can I extract from multiple sitemaps in one run?
Yes. Add as many sitemap URLs as you need in the sitemapUrls input field — all will be processed in a single run.
What if the sitemap URL redirects? The Actor handles HTTP redirects automatically.
Can I filter results to only specific URL patterns?
Yes — use the filterUrlPattern field with a regular expression (e.g. /blog/.* to keep only blog pages).
Is the data exportable to Excel or Google Sheets? Yes. After the run, export as CSV from the Dataset tab and open it directly in Excel or Google Sheets.
What happens if a sitemap is behind a bot check? Enable the Proxy Configuration option to route requests through Apify's residential or datacenter proxies.