Sitemap URL Extractor avatar

Sitemap URL Extractor

Pricing

from $1.50 / 1,000 results

Go to Apify Store
Sitemap URL Extractor

Sitemap URL Extractor

Extract every URL and its metadata from any sitemap.xml in seconds. Paste one or more sitemap URLs, run the Actor, and get a clean, structured dataset with url, lastmod, changefreq, priority, and more — ready to export as CSV, JSON, or Excel.

Pricing

from $1.50 / 1,000 results

Rating

0.0

(0)

Developer

mikolabs

mikolabs

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

Sitemap URL Extractor — Bulk XML Sitemap Parser for SEO & Content Audits

Extract every URL and its metadata from any sitemap.xml in seconds. Paste one or more sitemap URLs, run the Actor, and get a clean, structured dataset with url, lastmod, changefreq, priority, and more — ready to export as CSV, JSON, or Excel.

💡 The most affordable sitemap extractor on Apify. Free plan users get 20 results free. Paying users pay just $1.50 per 1,000 results — a fraction of competing tools.


What This Actor Does

This Actor accepts one or more sitemap.xml URLs and:

  • Crawls and parses all URLs from standard sitemaps (urlset)
  • Automatically follows nested sitemap index files (sitemapindex) up to a configurable depth
  • Extracts all standard sitemap fields: url, lastmod, changefreq, priority
  • Supports image sitemap and Google News sitemap extensions
  • Optionally filters results using a custom regex pattern
  • Returns structured, export-ready data

Use Cases

🔍 SEO Analysis

Extract every page URL from a website's sitemap to audit indexation, spot orphaned pages, or validate that all key content is discoverable by search engines.

📋 Content Inventory

Build a complete list of all pages on a website before a migration, redesign, or CMS switch. Know exactly what exists before you move it.

Pull all sitemap URLs and feed them into a link checker to find 404s, redirects, or server errors across your entire site.

🏆 Competitive Analysis

Discover how a competitor structures their website by parsing their public sitemap. Understand which pages they prioritize and how frequently they publish.

📰 Content Monitoring

Track lastmod dates across your sitemap over time to monitor publishing frequency and detect stale content.


How to Use

Step 1 — Open the Actor

Go to the Input tab in the Apify Console.

Step 2 — Configure Your Inputs

FieldDescriptionDefault
Sitemap URLsOne or more sitemap.xml URLs to extract fromRequired
Max DepthHow deep to follow nested sitemapindex files3
Request TimeoutSeconds to wait per request30
Filter URL PatternOptional regex to keep only matching URLs(none)
Proxy ConfigurationOptional proxy for rate-limited sites(none)

Example sitemap URL:

https://mikolabs.xyz/sitemap.xml

You can add multiple URLs — the Actor processes them all in one run.

Step 3 — Run the Actor

Click Start and the Actor will crawl your sitemap(s), follow any nested indexes, and collect all URL records into the dataset.

Step 4 — Export Your Results

Once the run finishes, go to the Storage → Dataset tab and export your data in:

  • CSV — open directly in Excel or Google Sheets
  • JSON — use in APIs or pipelines
  • XLSX — ready-made spreadsheet

Output Example

Each row in the output dataset represents one URL found in the sitemap:

[
{
"url": "https://mikolabs.xyz/",
"lastmod": "2026-04-17",
"changefreq": "monthly",
"priority": 1.0,
"sitemapUrl": "https://mikolabs.xyz/sitemap.xml",
"sitemapType": "urlset",
"scrapedAt": "2026-04-18T10:29:37.665452+00:00"
},
{
"url": "https://mikolabs.xyz/apis",
"lastmod": "2026-04-17",
"changefreq": "weekly",
"priority": 0.8,
"sitemapUrl": "https://mikolabs.xyz/sitemap.xml",
"sitemapType": "urlset",
"scrapedAt": "2026-04-18T10:29:37.665452+00:00"
},
{
"url": "https://mikolabs.xyz/pricing",
"lastmod": "2026-04-17",
"changefreq": "monthly",
"priority": 0.7,
"sitemapUrl": "https://mikolabs.xyz/sitemap.xml",
"sitemapType": "urlset",
"scrapedAt": "2026-04-18T10:29:37.665452+00:00"
}
]

Output Fields

FieldTypeDescription
urlstringThe page URL from the sitemap
lastmodstringLast modified date (ISO 8601)
changefreqstringHow often the page changes (daily, weekly, monthly…)
prioritynumberPage priority relative to the rest of the site (0.0–1.0)
sitemapUrlstringThe source sitemap this URL was found in
sitemapTypestringurlset or sitemapindex
imagesarrayImage entries from image sitemap extensions (if present)
newsobjectGoogle News metadata (if present)
scrapedAtstringTimestamp of when the record was collected

Input Reference

{
"sitemapUrls": ["https://mikolabs.xyz/sitemap.xml"],
"maxDepth": 3,
"requestTimeoutSecs": 30,
"filterUrlPattern": ""
}

sitemapUrls (required) An array of one or more sitemap.xml URLs. Accepts both standard urlset sitemaps and sitemapindex files that point to other sitemaps.

maxDepth (optional, default: 3) Controls how many levels of nested sitemap index files the Actor will follow. Set to 1 to only parse the provided sitemaps without following any child links.

requestTimeoutSecs (optional, default: 30) Maximum time in seconds to wait for each sitemap response. Increase this for slow servers.

filterUrlPattern (optional) A regular expression to filter which URLs are saved to the dataset. For example, https://example\.com/blog/.* will only save blog URLs. Leave empty to collect all URLs.

proxyConfiguration (optional) Enables Apify proxy rotation to avoid IP blocks on rate-limited websites. Not required for most public sitemaps.


Pricing

PlanPrice
Free20 results free
Pay-as-you-go / Subscription$1.50 per 1,000 results

This Actor is among the most competitively priced sitemap extractors on the Apify platform — ideal for one-off audits, scheduled monitoring, and large-scale extractions alike.


Frequently Asked Questions

Does it support sitemap index files? Yes. If your sitemap URL points to a sitemapindex (a sitemap of sitemaps), the Actor will automatically follow all child sitemap links up to the configured maxDepth.

Can I extract from multiple sitemaps in one run? Yes. Add as many sitemap URLs as you need in the sitemapUrls input field — all will be processed in a single run.

What if the sitemap URL redirects? The Actor handles HTTP redirects automatically.

Can I filter results to only specific URL patterns? Yes — use the filterUrlPattern field with a regular expression (e.g. /blog/.* to keep only blog pages).

Is the data exportable to Excel or Google Sheets? Yes. After the run, export as CSV from the Dataset tab and open it directly in Excel or Google Sheets.

What happens if a sitemap is behind a bot check? Enable the Proxy Configuration option to route requests through Apify's residential or datacenter proxies.