Sitemap URL Extractor
Pricing
Pay per event
Sitemap URL Extractor
This actor parses XML sitemaps and extracts all URLs with their metadata. It handles both regular sitemaps and sitemap indexes (recursively follows child sitemaps up to 3 levels deep). For each URL, it captures the last modified date, change frequency, priority, and whether the entry...
Pricing
Pay per event
Rating
0.0
(0)
Developer

Stas Persiianenko
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
Extract all URLs from XML sitemaps with last modified dates, priorities, and change frequencies.
What does Sitemap URL Extractor do?
This actor parses XML sitemaps and extracts all URLs with their metadata. It handles both regular sitemaps and sitemap indexes (recursively follows child sitemaps up to 3 levels deep). For each URL, it captures the last modified date, change frequency, priority, and whether the entry contains image or video extensions. Use it to build complete URL inventories for SEO audits, migration planning, or feeding URL lists into other scrapers.
Provide one or more sitemap URLs and the actor will return every page URL listed in those sitemaps along with all available metadata -- last modified dates, priorities, change frequencies -- in a clean, structured JSON format.
Use cases
- SEO specialists -- discover all indexed URLs from a website's sitemap to audit coverage and find orphan pages
- Migration planners -- extract full URL lists for redirect mapping during domain or CMS migrations
- Content strategists -- build a complete inventory of published pages with their last-modified dates and priorities
- DevOps engineers -- monitor sitemap changes over time by scheduling regular extraction runs
- Web scraping engineers -- use extracted sitemap URLs as input for other Apify scrapers instead of building crawlers
Why use Sitemap URL Extractor?
- Handles sitemap indexes -- recursively follows sitemap index files up to 3 levels deep to capture every URL
- Rich metadata -- extracts last modified date, change frequency, priority, and image/video extension flags for each URL
- Configurable limits -- set a maximum URL count to control run time and cost for very large sitemaps
- Batch processing -- provide multiple sitemap URLs in one run to process several sites at once
- Structured JSON output -- every extracted URL comes with its source sitemap and full metadata for easy filtering
- Pay-per-event pricing -- costs fractions of a cent per URL extracted, with no monthly subscription
- Fast HTTP processing -- no browser needed, so even sitemaps with tens of thousands of URLs are processed quickly
Input parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
sitemapUrls | string[] | Yes | -- | List of XML sitemap URLs to extract. Supports both regular sitemaps and sitemap indexes |
maxUrls | integer | No | 10000 | Maximum number of URLs to extract across all sitemaps (1-100,000) |
Example input
{"sitemapUrls": ["https://www.example.com/sitemap.xml"],"maxUrls": 10000}
Output example
{"url": "https://www.example.com/page","sitemapSource": "https://www.example.com/sitemap.xml","lastModified": "2026-02-15","changeFrequency": "weekly","priority": 0.8,"isImage": false,"isVideo": false,"imageCount": 0}
Output fields
| Field | Type | Description |
|---|---|---|
url | string | The extracted page URL |
sitemapSource | string | The sitemap URL this entry was found in |
lastModified | string | Last modification date from the sitemap (ISO format) |
changeFrequency | string | Change frequency hint (always, hourly, daily, weekly, monthly, yearly, never) |
priority | number | Priority value from the sitemap (0.0 to 1.0) |
isImage | boolean | Whether the entry contains image extensions |
isVideo | boolean | Whether the entry contains video extensions |
imageCount | number | Number of image entries associated with this URL |
How much does it cost?
Sitemap URL Extractor uses Apify's pay-per-event pricing model. You only pay for what you use.
| Event | Price | Description |
|---|---|---|
| Start | $0.035 | One-time per run |
| URL extracted | $0.0005 | Per URL found in sitemap |
Example costs:
- 100 URLs: $0.035 + 100 x $0.0005 = $0.085
- 1,000 URLs: $0.035 + 1,000 x $0.0005 = $0.535
- 10,000 URLs: $0.035 + 10,000 x $0.0005 = $5.035
Using the Apify API
You can start Sitemap URL Extractor programmatically from your own applications using the Apify API. The following examples show how to run the actor and retrieve results in both Node.js and Python.
Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_TOKEN' });const run = await client.actor('automation-lab/sitemap-url-extractor').call({sitemapUrls: ['https://www.example.com/sitemap.xml'],maxUrls: 10000,});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
Python
from apify_client import ApifyClientclient = ApifyClient('YOUR_TOKEN')run = client.actor('automation-lab/sitemap-url-extractor').call(run_input={'sitemapUrls': ['https://www.example.com/sitemap.xml'],'maxUrls': 10000,})items = client.dataset(run['defaultDatasetId']).list_items().itemsprint(items)
Integrations
Sitemap URL Extractor works with all major automation platforms available on Apify. Export results to Google Sheets to build a complete URL inventory spreadsheet for SEO analysis. Use Zapier or Make to trigger extraction runs on a schedule and track sitemap changes weekly. Send notifications to Slack when new URLs appear or old ones are removed from sitemaps. Pipe results into n8n workflows to feed extracted URLs directly into other scrapers or data pipelines. Set up webhooks to get notified when extraction finishes, then use the output as input for downstream actors. You can also use the Apify scheduling feature to automate weekly or daily sitemap extractions for ongoing monitoring.
Supported sitemap formats
The actor supports the following XML sitemap types as defined by the sitemaps.org protocol:
- URL sets (
<urlset>) -- standard sitemaps containing individual page URLs with optional lastmod, changefreq, and priority - Sitemap indexes (
<sitemapindex>) -- index files that reference other sitemaps, followed recursively up to 3 levels deep - Image sitemaps -- sitemap entries with
<image:image>extensions are detected and flagged - Video sitemaps -- sitemap entries with
<video:video>extensions are detected and flagged
Tips and best practices
- Start with the sitemap index if the site has one -- the actor will automatically follow all child sitemaps so you do not need to list them individually
- Use
maxUrlsto control costs for very large sites -- start with a small limit to estimate the total, then increase for a full extraction - Filter results by
lastModifiedto find recently updated pages, which is useful for monitoring content changes - Chain with other actors -- use extracted URLs as input for Content Readability Checker, Word Counter, or Structured Data Extractor
- Schedule weekly runs to maintain an up-to-date URL inventory and detect when pages are added or removed from sitemaps
- Use the
changeFrequencyfield to identify which pages the site owner considers most dynamic -- pages marked as "daily" or "hourly" are likely the most actively maintained content
FAQ
Does the actor handle sitemap index files? Yes. When a sitemap URL points to a sitemap index, the actor recursively follows all child sitemaps up to 3 levels deep and extracts URLs from each one.
What if a sitemap URL returns an error?
The actor logs the error and continues processing remaining sitemaps. Each extracted URL includes a sitemapSource field so you can trace which sitemap it came from.
Can I extract URLs from non-XML sitemaps (like TXT or HTML)? No. The actor only parses standard XML sitemaps and sitemap indexes that follow the sitemaps.org protocol. For HTML sitemaps, you would need a web scraper.
How do I find the sitemap URL for a website?
Most websites list their sitemaps in their robots.txt file (usually at https://example.com/robots.txt). Common sitemap locations include /sitemap.xml, /sitemap_index.xml, and /sitemap/index.xml. You can also use the Robots.txt & Sitemap Analyzer actor to automatically discover sitemap URLs.
What does the priority field mean?
The priority value (0.0 to 1.0) is a hint from the website about the relative importance of a URL compared to other URLs on the same site. A value of 1.0 is the highest priority. Note that search engines may or may not use this value in their ranking decisions. Many sites set all URLs to the same priority or omit the field entirely.