Pricing

from $1.00 / 1,000 sitemap lookups

XML Sitemap Finder & Extractor API

Find and extract all XML sitemaps for any domain. Automatically parses robots.txt, scans HTML tags, and recursively follows indexes. Perfect for SEO & web scraping.

Pricing

from $1.00 / 1,000 sitemap lookups

Rating

0.0

(0)

Developer

Andok

Actor stats

Bookmarked

Total users

Monthly active users

3 months ago

Last modified

Sitemap Finder

Discover all XML sitemaps for any website. Provide one or more URLs and the actor will systematically locate every sitemap by checking common file paths, parsing robots.txt, and scanning HTML content for sitemap references.

Features

Multi-source discovery — checks 15+ common sitemap paths, robots.txt directives, and HTML <a> / <link> tags
Batch processing — process multiple websites in a single run with configurable concurrency
Recursive index traversal — follows sitemap index files to discover all nested child sitemaps
Gzip support — handles .xml.gz compressed sitemaps automatically
XML validation — verifies sitemaps contain valid XML and classifies them as index or urlset
Rich metadata — reports URL count per sitemap, last modified date, discovery source, and validation status
Pay-per-event pricing — only pay for what you use at $0.001 per URL lookup

Input

Field	Type	Default	Description
`urls`	`string[]`	—	Website URLs to check (e.g., `["https://example.com"]`)
`url`	`string`	—	Single URL for backward compatibility. Merged into `urls` if both are set.
`findAll`	`boolean`	`true`	Find all sitemaps or stop after the first one
`followIndexes`	`boolean`	`true`	Recursively follow sitemap index files to discover child sitemaps
`verify`	`boolean`	`true`	Verify sitemaps are valid XML and extract metadata
`timeout`	`integer`	`10`	HTTP request timeout in seconds
`concurrency`	`integer`	`3`	Max concurrent website processing (1–20)

Example Input

{
    "urls": ["https://example.com", "https://crawlee.dev"],
    "findAll": true,
    "followIndexes": true,
    "verify": true,
    "timeout": 10,
    "concurrency": 3
}

Output

Results are stored in the default dataset. Each record represents a discovered sitemap:

{
    "websiteUrl": "https://crawlee.dev",
    "sitemapUrl": "https://crawlee.dev/sitemap.xml",
    "type": "index",
    "urlCount": 4,
    "lastModified": "2024-12-15T10:30:00Z",
    "isValid": true,
    "source": "common-location"
}

Field	Description
`websiteUrl`	The input website URL
`sitemapUrl`	Full URL of the discovered sitemap
`type`	Sitemap type: `index` (contains other sitemaps), `urlset` (contains page URLs), or `unknown`
`urlCount`	Number of entries in the sitemap (child sitemaps for indexes, page URLs for urlsets)
`lastModified`	Most recent `<lastmod>` date found in the sitemap
`isValid`	Whether the sitemap contains valid XML
`source`	How the sitemap was discovered: `common-location`, `robots.txt`, `html-content`, or `index:<parent-url>`
`error`	Error message if the lookup failed (only present on error records)

When no sitemaps are found for a URL, a single record is returned with sitemapUrl: null and an appropriate error message.

API Usage

Call the actor via the API and retrieve results from the default dataset:

curl "https://api.apify.com/v2/acts/YOUR_USERNAME~find-sitemap-from-url/run-sync-get-dataset-items?token=YOUR_TOKEN" \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{"urls": ["https://example.com"]}'

Pricing

This actor uses pay-per-event (PPE) pricing:

Event	Cost
`sitemap-lookup`	$0.001 per URL processed

You are charged once per input URL, regardless of how many sitemaps are discovered for that URL. There are no additional platform fees beyond the per-event charge.

Use Cases

SEO auditing — verify sitemap coverage and freshness across your sites
Web scraping — discover all available sitemaps before crawling to plan efficient scraping
Site monitoring — track sitemap changes, URL counts, and last modified dates over time
Competitor analysis — map out a competitor's site structure via their sitemaps
Migration validation — confirm sitemaps are correctly set up after a site migration
Content indexing — find all content endpoints for search engine optimization

Discovery Methods

The actor uses three complementary discovery strategies:

Common paths — checks 15+ well-known sitemap file locations (/sitemap.xml, /wp-sitemap.xml, /sitemap_index.xml, etc.)
robots.txt — parses Sitemap: directives from the site's robots.txt file
HTML scanning — searches the homepage HTML for <a> and <link> tags referencing sitemaps

When followIndexes is enabled, any discovered sitemap index is recursively expanded to reveal all child sitemaps.

XML Sitemap Scraper & URL Extractor API - SEO Crawler

pink_comic/sitemap-url-extractor

Extract URLs from XML sitemaps and robots.txt for SEO crawls, audits, content migrations, and RAG indexing. Auto-discovers sitemap files, parses nested sitemap indexes, and exports URL, lastmod, priority, changefreq, and image metadata in bulk.

Ava Torres

Sitemap URL Extractor

automation-lab/sitemap-url-extractor

This actor parses XML sitemaps and extracts all URLs with their metadata. It handles both regular sitemaps and sitemap indexes (recursively follows child sitemaps up to 3 levels deep). For each URL, it captures the last modified date, change frequency, priority, and whether the entry...

Stas Persiianenko

Sitemap URL Discovery (sitemap.xml + robots.txt → all URLs)

gochujang/sitemap-url-discovery

Given a domain, finds sitemap.xml / sitemap_index.xml (also via robots.txt), recursively expands sitemap indexes, returns one row per discovered URL with lastmod / changefreq / priority. SEO audits, crawl-target prep, content cataloging. $0.0001/URL + $0.01 site fee.

Hojun Lee

Sitemap Finder & URL Extractor · Crawl Any XML Sitemap

corent1robert/sitemap-detector

Find and crawl XML sitemaps from any website. Follows sitemap indexes, handles gzip, and exports every page URL with source file and lastmod into a clean dataset. No config needed.

Corentin Robert

Sitemap Extractor

automationagents/web-sitemap

Extract all URLs from a website's sitemap (XML, robots.txt, or crawl discovery).

Alex Jordan

XML Sitemap URL Extractor

andok/sitemap-extractor

Recursively crawl and extract every single URL from a website’s sitemap.xml. Automate your SEO audits and scraping queues.

Andok

Sitemap URL Extractor — robots.txt + sitemap.xml Crawl

v0iddo/sitemap-url-extractor

Discover every URL a site exposes via its public sitemap chain. Reads robots.txt, follows Sitemap declarations, recursively descends sitemap-index files, extracts URLs with lastmod, changefreq, priority.

vøiddo

Sitemap URL Extractor - List All URLs in a Sitemap

dltik/sitemap-url-extractor

Extract every URL from any XML sitemap, with lastmod, changefreq and priority. Resolves sitemap indexes recursively. Pass a sitemap.xml or just a site root to auto-discover its sitemaps. Pure HTTP, no browser — fast and cheap.

Walid

Find Sitemap from url

eesti/find-sitemap-from-url

A powerful [Apify Actor] that finds sitemap URLs for any website. This Actor helps you discover XML sitemaps by checking common locations, robots.txt files, and analyzing HTML content for sitemap links.

ando

210

1.0

Sitemap URL Extractor

wiry_kingdom/sitemap-url-extractor

Extract every URL from any website's sitemap.xml with lastmod, changefreq, priority. Recursively expands sitemap index files, reads robots.txt, handles gzipped sitemaps. SEO audits, content migration, site inventory, competitor research.