Pricing

Pay per event

Sitemap URL Extractor

This actor parses XML sitemaps and extracts all URLs with their metadata. It handles both regular sitemaps and sitemap indexes (recursively follows child sitemaps up to 3 levels deep). For each URL, it captures the last modified date, change frequency, priority, and whether the entry...

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Actor stats

Bookmarked

Total users

Monthly active users

20 days ago

Last modified

What does Sitemap URL Extractor do?

This actor parses XML sitemaps and extracts all URLs with their metadata. It handles both regular sitemaps and sitemap indexes (recursively follows child sitemaps up to 3 levels deep). For each URL, it captures the last modified date, change frequency, priority, and whether the entry contains image or video extensions. Use it to build complete URL inventories for SEO audits, migration planning, or feeding URL lists into other scrapers.

Provide one or more sitemap URLs and the actor will return every page URL listed in those sitemaps along with all available metadata -- last modified dates, priorities, change frequencies -- in a clean, structured JSON format.

Who is it for?

SEO specialists -- discover all indexed URLs from a website's sitemap to audit coverage and find orphan pages
Migration planners -- extract full URL lists for redirect mapping during domain or CMS migrations
Content strategists -- build a complete inventory of published pages with their last-modified dates and priorities
DevOps engineers -- monitor sitemap changes over time by scheduling regular extraction runs
Web scraping engineers -- use extracted sitemap URLs as input for other Apify scrapers instead of building crawlers

Why use Sitemap URL Extractor?

Handles sitemap indexes -- recursively follows sitemap index files up to 3 levels deep to capture every URL
Rich metadata -- extracts last modified date, change frequency, priority, and image/video extension flags for each URL
Configurable limits -- set a maximum URL count to control run time and cost for very large sitemaps
Batch processing -- provide multiple sitemap URLs in one run to process several sites at once
Structured JSON output -- every extracted URL comes with its source sitemap and full metadata for easy filtering
Pay-per-event pricing -- costs fractions of a cent per URL extracted, with no monthly subscription
Fast HTTP processing -- no browser needed, so even sitemaps with tens of thousands of URLs are processed quickly

Input parameters

Parameter	Type	Required	Default	Description
`sitemapUrls`	string[]	Yes	--	List of XML sitemap URLs to extract. Supports both regular sitemaps and sitemap indexes
`maxUrls`	integer	No	`10000`	Maximum number of URLs to extract across all sitemaps (1-100,000)

Example input

{
    "sitemapUrls": [
        "https://www.example.com/sitemap.xml"
    ],
    "maxUrls": 10000
}

Recurring sitemap monitoring

Turn a one-off sitemap export into a scheduled URL inventory. Save your sitemap input and run it weekly for active content sites or monthly for smaller sites. Compare each dataset with the previous run to spot newly published pages, removed URLs, stale lastModified dates, or unexpected sitemap growth.

Example weekly monitoring input:

{
    "sitemapUrls": [
        "https://www.example.com/sitemap.xml"
    ],
    "maxUrls": 50000
}

For a broader site-audit bundle, feed the extracted URLs into page-level checks and combine this actor with RSS Feed Reader for content updates, SSL Certificate Checker for HTTPS health, Domain Age Checker, and Tech Stack Detector.

Output example

{
    "url": "https://www.example.com/page",
    "sitemapSource": "https://www.example.com/sitemap.xml",
    "lastModified": "2026-02-15",
    "changeFrequency": "weekly",
    "priority": 0.8,
    "isImage": false,
    "isVideo": false,
    "imageCount": 0
}

Output fields

Field	Type	Description
`url`	string	The extracted page URL
`sitemapSource`	string	The sitemap URL this entry was found in
`lastModified`	string	Last modification date from the sitemap (ISO format)
`changeFrequency`	string	Change frequency hint (always, hourly, daily, weekly, monthly, yearly, never)
`priority`	number	Priority value from the sitemap (0.0 to 1.0)
`isImage`	boolean	Whether the entry contains image extensions
`isVideo`	boolean	Whether the entry contains video extensions
`imageCount`	number	Number of image entries associated with this URL

How much does it cost to extract sitemap URLs?

Sitemap URL Extractor uses Apify's pay-per-event pricing model. You only pay for what you use.

Event	Price	Description
Start	$0.035	One-time per run
URL extracted	$0.0005	Per URL found in sitemap

Example costs:

100 URLs: $0.035 + 100 x $0.0005 = $0.085
1,000 URLs: $0.035 + 1,000 x $0.0005 = $0.535
10,000 URLs: $0.035 + 10,000 x $0.0005 = $5.035

How to extract URLs from XML sitemaps

Go to the Sitemap URL Extractor on Apify Store.
Enter one or more XML sitemap URLs in the Sitemap URLs field.
Optionally set a Max URLs limit to control costs for large sitemaps.
Click Start and wait for the run to finish.
Download your extracted URL list in JSON, CSV, or Excel format.

API usage

You can start Sitemap URL Extractor programmatically from your own applications using the Apify API. The following examples show how to run the actor and retrieve results in both Node.js and Python.

Node.js

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: 'YOUR_TOKEN' });
const run = await client.actor('automation-lab/sitemap-url-extractor').call({
    sitemapUrls: ['https://www.example.com/sitemap.xml'],
    maxUrls: 10000,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Python

from apify_client import ApifyClient

client = ApifyClient('YOUR_TOKEN')
run = client.actor('automation-lab/sitemap-url-extractor').call(run_input={
    'sitemapUrls': ['https://www.example.com/sitemap.xml'],
    'maxUrls': 10000,
})
items = client.dataset(run['defaultDatasetId']).list_items().items
print(items)

cURL

curl -X POST "https://api.apify.com/v2/acts/automation-lab~sitemap-url-extractor/runs?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "sitemapUrls": ["https://www.example.com/sitemap.xml"],
    "maxUrls": 10000
  }'

Use with Claude AI (MCP)

This actor is available as a tool in Claude AI through the Model Context Protocol (MCP). Add it to Claude Desktop, Cursor, Windsurf, or any MCP-compatible client.

Setup for Claude Code

$claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/sitemap-url-extractor"

Setup for Claude Desktop, Cursor, or VS Code

Add this to your MCP config file:

{
    "mcpServers": {
        "apify": {
            "url": "https://mcp.apify.com?tools=automation-lab/sitemap-url-extractor"
        }
    }
}

Example prompts

"Extract all URLs from this website's sitemap: https://www.example.com/sitemap.xml"
"How many pages are in example.com's sitemap?"
"Get all URLs from these sitemaps and tell me which ones were updated in the last month"

Learn more in the Apify MCP documentation.

Integrations

Sitemap URL Extractor works with all major automation platforms available on Apify. Export results to Google Sheets to build a complete URL inventory spreadsheet for SEO analysis. Use Zapier or Make to trigger extraction runs on a schedule and track sitemap changes weekly. Send notifications to Slack when new URLs appear or old ones are removed from sitemaps. Pipe results into n8n workflows to feed extracted URLs directly into other scrapers or data pipelines. Set up webhooks to get notified when extraction finishes, then use the output as input for downstream actors. You can also use the Apify scheduling feature to automate weekly or daily sitemap extractions for ongoing monitoring.

Supported sitemap formats

The actor supports the following XML sitemap types as defined by the sitemaps.org protocol:

URL sets (<urlset>) -- standard sitemaps containing individual page URLs with optional lastmod, changefreq, and priority
Sitemap indexes (<sitemapindex>) -- index files that reference other sitemaps, followed recursively up to 3 levels deep
Image sitemaps -- sitemap entries with <image:image> extensions are detected and flagged
Video sitemaps -- sitemap entries with <video:video> extensions are detected and flagged

Tips and best practices

Start with the sitemap index if the site has one -- the actor will automatically follow all child sitemaps so you do not need to list them individually
Use maxUrls to control costs for very large sites -- start with a small limit to estimate the total, then increase for a full extraction
Filter results by lastModified to find recently updated pages, which is useful for monitoring content changes
Chain with other actors -- use extracted URLs as input for Content Readability Checker, Word Counter, or Structured Data Extractor
Schedule weekly runs to maintain an up-to-date URL inventory and detect when pages are added or removed from sitemaps
Use the changeFrequency field to identify which pages the site owner considers most dynamic -- pages marked as "daily" or "hourly" are likely the most actively maintained content

Legality

This tool analyzes publicly accessible web content. Automated analysis of public web resources is standard practice in SEO and web development. Always respect robots.txt directives and rate limits when analyzing third-party websites. For personal data processing, ensure compliance with applicable privacy regulations.

FAQ

Does the actor handle sitemap index files? Yes. When a sitemap URL points to a sitemap index, the actor recursively follows all child sitemaps up to 3 levels deep and extracts URLs from each one.

What if a sitemap URL returns an error? The actor logs the error and continues processing remaining sitemaps. Each extracted URL includes a sitemapSource field so you can trace which sitemap it came from.

Can I extract URLs from non-XML sitemaps (like TXT or HTML)? No. The actor only parses standard XML sitemaps and sitemap indexes that follow the sitemaps.org protocol. For HTML sitemaps, you would need a web scraper.

How do I find the sitemap URL for a website? Most websites list their sitemaps in their robots.txt file (usually at https://example.com/robots.txt). Common sitemap locations include /sitemap.xml, /sitemap_index.xml, and /sitemap/index.xml. You can also use the Robots.txt & Sitemap Analyzer actor to automatically discover sitemap URLs.

What does the priority field mean? The priority value (0.0 to 1.0) is a hint from the website about the relative importance of a URL compared to other URLs on the same site. A value of 1.0 is the highest priority. Note that search engines may or may not use this value in their ranking decisions. Many sites set all URLs to the same priority or omit the field entirely.

The actor returns zero URLs for a sitemap I know exists. What is wrong? Check that the URL points to a valid XML sitemap, not an HTML sitemap page. Also verify the sitemap is not blocked by robots.txt or served behind authentication. Some CDNs return a 200 status with an error page instead of the actual XML, which can cause parsing failures.

The actor seems to miss some URLs from a large site. How do I get them all? Large websites often split their sitemaps across many child files referenced by a sitemap index. Make sure you are providing the sitemap index URL (often /sitemap_index.xml or /sitemap.xml) rather than a specific child sitemap. Also check the maxUrls parameter -- the default is 10,000. Increase it if the site has more URLs than that.

SEO Title & Description Checker — Validate page titles and meta descriptions
RSS Feed Reader — Parse RSS and Atom feeds into structured data
Security.txt Checker — Check websites for security.txt compliance
Broken Link Checker — Find broken links on any website
Robots & Sitemap Analyzer — Analyze robots.txt and discover sitemaps
Website Health Report — Comprehensive website health and SEO audit

Sitemap URL Extractor & Parser

pink_comic/sitemap-url-extractor

Extract all URLs from XML sitemaps. Auto-discovers sitemaps via robots.txt. Handles nested sitemap indexes. Returns URL, last modified date, change frequency, priority, and image metadata. For SEO audits, content migration, and competitive analysis. No API key needed.

Ava Torres

Sitemap Extractor

cerebral_aluminum/sitemap-extractor

Extract all URLs from website sitemaps. Pages, images, PDFs. Handles sitemap indexes and WordPress.

Benny

XML Sitemap Finder & Extractor API

andok/find-sitemap-from-url

Find and extract all XML sitemaps for any domain. Automatically parses robots.txt, scans HTML tags, and recursively follows indexes. Perfect for SEO & web scraping.

Andok

Sitemap Crawler - XML Sitemap URL Extractor

miccho27/sitemap-crawler

Extract all URLs from XML sitemaps (including sitemap index) and optionally audit each page

Tatsuya Mizuno

Sitemap URL Extractor - List All URLs in a Sitemap

dltik/sitemap-url-extractor

Extract every URL from any XML sitemap, with lastmod, changefreq and priority. Resolves sitemap indexes recursively. Pass a sitemap.xml or just a site root to auto-discover its sitemaps. Pure HTTP, no browser — fast and cheap.

Walid

Sitemap to URL List Extractor

scrapeworks/sitemap-to-urls

Extract every URL from any website's sitemap as clean JSON. Handles sitemap indexes (recursive) and gzipped sitemaps automatically. Includes lastmod, priority, and changefreq.

Nicolas van Arkens

Sitemap URL Extractor

crawlerbros/sitemap-url-extractor

Extract every URL from any site's sitemap.xml with handles sitemap index files (nested sitemaps), gzipped sitemaps, and robots.txt discovery. Returns URL, lastmod, changefreq, priority, and optional image/video/alternate-language fields. No proxy, no cookies, no login.

Crawler Bros

Sitemap Finder & URL Extractor · Crawl Any XML Sitemap

corent1robert/sitemap-detector

Find and crawl XML sitemaps from any website. Follows sitemap indexes, handles gzip, and exports every page URL with source file and lastmod into a clean dataset. No config needed.

Corentin Robert

Sitemap Url Extractor

scrapers-hub/sitemap-url-extractor

Sitemap URL extractor to extract all URLs from XML sitemaps quickly and efficiently 🌐📄 Ideal for SEO audits, site analysis, and indexing workflows. Fast, accurate, and easy to use.

Scrapers Hub

Sitemap & URL Discovery - Find All URLs on Any Site

santamaria-automations/sitemap-url-discovery

Discover every URL on any website by parsing sitemap.xml, robots.txt, and sitemap indexes. Extract URLs with last modified dates, change frequency, and priority. Perfect for SEO audits, content analysis, crawling preparation, and site mapping.