Deprecated

Pricing

Pay per event

See alternative Actors

Go to Apify Store

Sitemap URL Extractor

Deprecated

See alternative Actors

Extract every URL from any website's sitemap.xml with lastmod, changefreq, priority. Recursively expands sitemap index files, reads robots.txt, handles gzipped sitemaps. SEO audits, content migration, site inventory, competitor research.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Mohieldin Mohamed

Actor stats

Bookmarked

Total users

Monthly active users

3 months ago

Last modified

What does Sitemap URL Extractor do?

Give it one website URL. It returns every URL that site publishes in its sitemap.xml — including URLs buried inside multi-level sitemap index files, gzipped sitemaps, and sitemaps referenced from robots.txt. Perfect for SEO audits, content migrations, site inventory, and competitor research. No API keys. No browser. No proxy required for most sites.

Try it: paste https://apify.com into the Start URLs field, press Start, and watch the dataset fill up with every indexable URL on the site. A typical mid-sized company site (5,000–50,000 URLs) finishes in under a minute.

Apify platform advantages include scheduled runs (daily sitemap snapshots), API access, webhook integrations, proxy rotation when needed, and run history.

Why use Sitemap URL Extractor?

SEO audits — see every URL Google is supposed to index and compare against your canonical list
Content migration — pull your entire old site's URL list before moving to a new CMS
Competitor intelligence — see every public page a competitor publishes, including product catalogs and blog archives
Link checking — feed the output into a link checker to find every broken link on a site
Snapshots over time — schedule daily runs and diff URL lists to detect content changes
Dataset for LLM training — get a clean list of URLs to feed into a content extractor

How to use Sitemap URL Extractor

Click Try for free (or Start if you're already logged in)
In the Start URLs field, paste one or more website root URLs (e.g. https://example.com)
Optionally set Max URLs per site to cap output size
Click Start
Watch the dataset populate in real time in the Output tab
Download as JSON, CSV, or Excel, or hit the API endpoint directly

Input

Start URLs — one or more website root URLs to crawl (e.g. https://apify.com)
Max URLs per site — safety cap (default 10,000, use 0 for unlimited)
Include metadata — attach lastmod, changefreq, priority to each URL (default: yes)
Follow sitemap index — recursively expand nested <sitemapindex> files (default: yes)
Proxy configuration — optional Apify Proxy for sites that block raw server IPs

Output

The actor pushes one dataset item per extracted URL. You can download in JSON, CSV, HTML, or Excel.

{
    "url": "https://apify.com/apify/instagram-scraper",
    "lastmod": "2025-03-14",
    "changefreq": "daily",
    "priority": 0.8,
    "sourceWebsite": "https://apify.com",
    "sitemapUrl": "https://apify.com/sitemap.xml",
    "sitemapDepth": 0,
    "discoveredAt": "2026-04-15T18:30:00.000Z"
}

Data table

Field	Type	Description
`url`	string	The extracted URL from the sitemap
`lastmod`	string	Last modification date from the sitemap (if present)
`changefreq`	string	How often the page is expected to change (`daily`, `weekly`, `monthly`, etc.)
`priority`	number	SEO priority hint from 0.0 to 1.0
`sourceWebsite`	string	The root URL you started from
`sitemapUrl`	string	The specific sitemap file where this URL was found
`sitemapDepth`	number	Nesting depth in sitemap index (0 = root sitemap)
`discoveredAt`	string	ISO timestamp of when the URL was extracted

Pricing

This actor uses Apify's pay-per-event pricing model so you only pay for what you get:

Actor start: $0.01 per run (covers robots.txt + sitemap fetches)
Per URL extracted: $0.0005 per URL added to your dataset

Example costs:

A small blog with 500 URLs → ~$0.26
A mid-sized site with 5,000 URLs → ~$2.51
A large catalog with 50,000 URLs → ~$25.01

Free Apify tier members get $5/month in platform credits, which covers ~10,000 URLs of extraction per month.

Tips and advanced options

Set maxUrlsPerSite to a safe cap during testing (e.g. 100) to verify the actor works before running unlimited
Disable includeMetadata if you only need URLs — this produces a much smaller dataset and faster downloads
Disable followSitemapIndex to get only the top-level sitemap contents (useful for homepage/landing-page inventories)
Enable Apify Proxy for sites that return 403 or ratelimit direct requests (government sites, some news publishers)
Schedule daily runs via Apify's scheduler to track how a competitor's URL list changes over time — diff the datasets to see new product launches or archived content

FAQ and support

Is this legal? The actor only reads publicly declared sitemap files. Sitemaps exist to be read by crawlers — by convention (and by the intent of the site owner who published them) they are meant for public consumption. Always respect the target site's Terms of Service and robots.txt disallow rules.

What about gzipped sitemaps? Fully supported. The actor auto-detects .gz URLs and Content-Encoding: gzip responses and decompresses transparently.

What about nested sitemap indexes? Supported up to 5 levels deep. Most sites have at most 2 levels (index → sitemap → urls).

The actor returned 0 URLs, help! The site probably doesn't publish a public sitemap. Try adding a custom sitemap URL explicitly in the Start URLs field (e.g. https://example.com/sitemap_index.xml).

Found a bug or missing feature? Open an issue on the Issues tab of this actor. Custom solutions available for enterprise use cases.

Sitemap URL Extractor

onescales/sitemap-url-extractor

Provide a website link to a sitemap.xml and the app will extract and list all URLs in the sitemap as well as additional data in the sitemap (i.e. https://onescales.com/sitemap.xml).

One Scales

593

5.0

(3)

Sitemap URL Extractor

crawlerbros/sitemap-url-extractor

Extract every URL from any site's sitemap.xml with handles sitemap index files (nested sitemaps), gzipped sitemaps, and robots.txt discovery. Returns URL, lastmod, changefreq, priority, and optional image/video/alternate-language fields. No proxy, no cookies, no login.

Crawler Bros

Sitemap to URL Crawler — Extract Sitemap.xml URLs

logiover/sitemap-to-url-crawler

Extract all URLs from any sitemap.xml recursively. Export sitemap URLs to CSV/JSON for RAG pipelines, SEO audits, and LLM training datasets.

Logiover

Sitemap URL Extractor

automation-lab/sitemap-url-extractor

This actor parses XML sitemaps and extracts all URLs with their metadata. It handles both regular sitemaps and sitemap indexes (recursively follows child sitemaps up to 3 levels deep). For each URL, it captures the last modified date, change frequency, priority, and whether the entry...

Stas Persiianenko

XML Sitemap Scraper & URL Extractor API - SEO Crawler

pink_comic/sitemap-url-extractor

Extract URLs from XML sitemaps and robots.txt for SEO crawls, audits, content migrations, and RAG indexing. Auto-discovers sitemap files, parses nested sitemap indexes, and exports URL, lastmod, priority, changefreq, and image metadata in bulk.

Ava Torres

Sitemap URL Extractor

mikolabs/sitemap-url-extractor

Extract every URL and its metadata from any sitemap.xml in seconds. Paste one or more sitemap URLs, run the Actor, and get a clean, structured dataset with url, lastmod, changefreq, priority, and more — ready to export as CSV, JSON, or Excel.

mikolabs

XML Sitemap URL Extractor

andok/sitemap-extractor

Recursively crawl and extract every single URL from a website’s sitemap.xml. Automate your SEO audits and scraping queues.

Andok

Sitemap URL Extractor

automationagents/web-sitemap

Extract every URL from a website via sitemap.xml, robots.txt, or crawl discovery. Feed clean URL lists straight into your scrapers.

Alex Jordan

Sitemap Url Extractor

scrapers-hub/sitemap-url-extractor

Sitemap URL extractor to extract all URLs from XML sitemaps quickly and efficiently 🌐📄 Ideal for SEO audits, site analysis, and indexing workflows. Fast, accurate, and easy to use.

Scrapers Hub

Sitemap Finder & URL Extractor · Crawl Any XML Sitemap

corent1robert/sitemap-detector

Find and crawl XML sitemaps from any website. Follows sitemap indexes, handles gzip, and exports every page URL with source file and lastmod into a clean dataset. No config needed.

Corentin Robert

Sitemap & URL Extractor — Get Every URL of a Website

dataquarry/sitemap-url-extractor

Get every URL of a website: parses sitemap.xml and sitemap-indexes (discovered via robots.txt or the default location), with a same-site crawl fallback when there's no sitemap. Returns each URL + lastmod. No API key.