Pricing

from $5.99 / 1,000 results

Sitemap Scraper

Sitemap Scraper extracts URLs, page metadata, update dates, images, and structured sitemap data from XML sitemaps. Ideal for SEO audits, website analysis, content discovery, indexing validation, competitor research, and large-scale web data collection.

Pricing

from $5.99 / 1,000 results

Rating

0.0

(0)

Developer

ScrapeVanta

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

Sitemap Scraper ⚡

If you need to extract URLs from a website sitemap but don’t want to manually download and parse XML files, you’re stuck with tedious copy-paste work. Sitemap Scraper automatically crawls sitemap URLs (including sitemap indexes) and saves every discovered URL to an Apify dataset. It’s a practical sitemap parser for SEO sitemap auditing, sitemap link scraping, and automated sitemap crawling. Built for SEO specialists, data analysts, and researchers who need URL lists fast at scale. In one run, you can process multiple sitemap URLs and get structured results without writing any code.

See the Data: Sample Output

Here's a real record from a single run:

{
  "url": "https://blog.apify.com/sitemap-articles.xml",
  "lastMod": "2026-05-28"
}

The actor writes extracted URL records to the dataset as objects like the one above.

Field	Type	What It Tells You
`url`	string	The discovered URL extracted from the sitemap or sitemap index.
`lastMod`	string \| null	The `lastmod` date (first 10 characters) when present—useful for freshness checks in your SEO sitemap scraper workflow.
`status`	string \| null	Signals whether the run produced data successfully for the parsed sitemap content (when available in your dataset export).
`error_message`	string \| null	Captures any parsing or fetching issues you may see reflected in dataset-related reporting (when available in your dataset export).
`baseUrl`	string \| null	Helps you trace which input sitemap URL the extracted items came from (when included in your dataset export view).
`source`	string \| null	Indicates the extraction context in your export tooling (when included in your dataset export view).
`timestamp`	string \| null	When present in your export, shows when the record was produced.
`runId`	string \| null	Useful for tracking records back to a specific actor run (when present in your export tooling).
`taskId`	string \| null	Helps correlate records with a particular processing unit (when present in your export tooling).
`success`	boolean \| null	Indicates whether the extraction for a sitemap succeeded (when present in your export tooling).
`warning`	string \| null	Any warning text captured by your export layer (when present).
`raw`	object \| null	If your export includes raw XML-derived fields, this may contain them (when present).

Export your full dataset as JSON, CSV, or Excel from the Apify dashboard.

Setting It Up

Drop this into your input.json and you're ready to go:

{
  "startUrls": [
    { "url": "https://blog.apify.com/sitemap.xml" }
  ]
}

Parameter	Required	What It Does
`startUrls`	✅	A list of sitemap URLs to crawl (it can include a sitemap index, which the actor will handle recursively).

What It Does

Sitemap Scraper downloads sitemap XML content from your provided sitemap URLs, parses it, and saves extracted URLs into a dataset.

Extract at Scale with Sitemap Scraper

Provide one or more sitemap URLs in startUrls, and the actor processes each one to extract all url entries. If a sitemap is actually a sitemap index, it automatically fetches and parses the nested sitemaps as well—so you get a complete URL list instead of partial results.

Works for Sitemap Indexes (Not Just Simple Sitemaps)

Many websites publish sitemap indexes that point to multiple sub-sitemaps. Sitemap Scraper is designed to recognize that structure and continue crawling through sub-sitemaps until it reaches regular urlset content.

Clean URL Output for SEO Audits

The dataset records include url and lastMod (taken from the sitemap’s lastmod value, truncated to the first 10 characters when present). That makes the output especially useful for sitemap parser workflows like SEO sitemap scraper tools, freshness checks, and sitemap link scraping.

Resilient Fetching with Retries

When fetching sitemap XML, the actor uses a maximum of 3 retries and includes error handling for common HTTP failure modes. It’s built to be dependable across real-world public web data.

URL Extraction from Public Web Data

Sitemap Scraper focuses on publicly available sitemap XML content. It’s a straightforward sitemap scraper tool when you need automated sitemap crawling without manual downloads and parsing.

Overall, Sitemap Scraper turns sitemap URL extraction into a one-click dataset you can export and analyze immediately.

Why Sitemap Scraper?

There are plenty of ways to pull data from sitemap XML files—here’s why Sitemap Scraper stands out.

Handles Sitemap Indexes Automatically

Instead of stopping at the first sitemap file, this tool continues into nested sitemap indexes. That means fewer gaps in your sitemap URL extraction results when you’re building a blog sitemap scraper or doing SEO sitemap auditing.

Ready-to-Analyze Output

The actor saves structured records to a dataset, including url and lastMod. This makes it easy to plug extracted URLs into downstream workflows—whether you’re doing a sitemap scraping software review or assembling a sitemap scraper for SEO auditing.

Built for Bulk, Not Busywork

You can supply multiple sitemap URLs in startUrls, then let the actor do the heavy lifting. For teams doing bulk sitemap url scraper tasks, this removes hours of manual parsing and keeps your process repeatable.

Real-World Use Cases

Here's how different teams put Sitemap Scraper to work:

SEO Teams
When an SEO audit needs a complete inventory of pages, you can run Sitemap Scraper with your site’s main sitemap URL(s) and get a clean dataset of discovered URLs. The lastMod field helps you spot freshness patterns quickly for sitemap-based crawling and auditing.

Content & Publishing Ops
For blog sitemap scraper workflows, you often want visibility into which sections are present and how often they update. Use Sitemap Scraper to extract URLs from the full sitemap structure (including indexes) and keep your internal content lists aligned.

Data Analysts
If you’re correlating URLs with performance metrics, you need a reliable baseline list of links. Sitemap Scraper gives you an export-friendly dataset that you can join with analytics data—no custom sitemap parser needed.

Automation & Developer Workflows
When you want to schedule automated sitemap crawling, you can integrate the actor into your pipeline and treat the output as a consistent source of truth. The dataset output works well as input to ETL jobs, monitoring scripts, and regular SEO refresh cycles.

How to Run It

No code required. Here's how to get your first results in under 5 minutes:

Open the actor on Apify — go to the actor page on console.apify.com.
Enter your inputs — add your sitemap URLs under startUrls (each item should contain a url).
Configure proxy settings (optional) — if your setup requires it, enable the provided proxy configuration options for better reliability.
Start the run and watch the live log — track sitemap fetching progress as it processes each start URL.
Open the Dataset tab — extracted url (and lastMod when available) records appear as they’re pushed.
Export in your preferred format — download from the Apify dataset tab as JSON, CSV, or Excel.

The whole setup takes under 5 minutes — results start appearing within seconds of launch.

Export & Integration Options

Once your data is collected, Sitemap Scraper fits directly into your existing workflow.

You can export results from the Apify dataset tab as JSON, CSV, or Excel for quick sharing and analysis. If you’re building a dashboard or running scripts, JSON is a convenient format for programmatic ingestion.

For integrations, you can use Apify’s API access to pull results into your systems, or connect to automation tools like Zapier / Make to push extracted URLs into your next step. You can also schedule runs so automated sitemap crawling happens regularly, without manual effort.

Pricing

Sitemap Scraper runs on Apify, which includes a free tier — no credit card needed to start. Free tier usage includes $5 platform credits on sign-up, which is typically enough for several real test runs. After that, runs are generally pay-as-you-go based on Apify compute units (CU), so you only spend when you execute the actor. For heavier workloads and ongoing monitoring, check Apify’s plans and pricing on the pricing page.

Start free at apify.com — scale up when you need to.

Reliability & Limitations

What We Handle	How
Retries for fetching sitemaps	Up to 3 retries with error handling and backoff logic
Redirects	Follow redirects enabled for sitemap fetching
Sitemap indexes	Recursively parses sitemap indexes until it reaches URL sets
Unknown or unexpected XML	Logs warnings when the root tag is not recognized
Partial failures	If a sitemap fails to fetch, processing continues for other provided start URLs

Limitations: Sitemap Scraper works with publicly accessible sitemap XML content. It does not bypass authentication or process private, login-gated, or otherwise restricted resources. If a sitemap returns malformed XML or is inaccessible due to server-side restrictions, you may see missing outputs for those specific inputs.

For enterprise-scale needs or custom configurations, reach out and we'll help.

Frequently Asked Questions

Is there a free plan?

Yes, Apify offers a free tier so you can run Sitemap Scraper and test the output before scaling up.

Do I need to log in or create an account on Apify to use this?

No—you can run Sitemap Scraper from the Apify interface once you have access to the actor page. To trigger the actor via the Apify API, you’ll use your Apify account credentials.

How accurate is the extracted data?

The actor extracts URLs that are present in the sitemap XML you provide. It parses urlset entries and handles sitemap indexes recursively, so accuracy depends on the sitemap content published by the website owner.

How many results can I get per run?

There’s no input-only limit in the provided actor schema. The number of records you get depends on how many URLs are contained in the sitemap(s) referenced by your startUrls.

How fresh is the data?

Freshness depends on when the website updates its sitemap and the time you run Sitemap Scraper. The optional lastMod field helps you understand the sitemap’s own reported update date.

Sitemap Scraper focuses on publicly available data inside sitemap XML files that can be accessed without special credentials. You’re responsible for ensuring your use complies with GDPR, CCPA, and applicable laws.

Can I export to Google Sheets or Excel?

Yes. You can export from the Apify dataset tab as JSON, CSV, or Excel, then move the data into Google Sheets or Excel workflows.

Can I schedule this to run automatically?

Yes. You can schedule actor runs on Apify for automated sitemap crawling so your URL lists stay up to date.

Can I access results via the API?

Yes. You can trigger runs and retrieve results programmatically via the Apify API.

What happens when the actor encounters an error?

When sitemap fetching or parsing fails, the actor logs errors and warnings and continues processing other provided inputs. For specific sitemap URLs that can’t be fetched after retries, you may see fewer or no extracted records for those inputs.

Get Help & Use Responsibly

Got a question about Sitemap Scraper or a feature you'd like added? Reach out at dataforleads@gmail.com. We’re happy to help with setup questions and are open to ideas like adding richer dataset metadata or supporting additional export-friendly structures.

Sitemap Scraper works with publicly available data from sitemap XML files. It does not access private accounts, login-gated pages, or password-protected content. You’re responsible for compliance with GDPR, CCPA, and any relevant platform terms. For data-removal requests, contact dataforleads@gmail.com. Use responsibly, ethically, and only for lawful purposes.

Sitemap URL Extractor - XML Sitemap Scraper

benthepythondev/sitemap-url-extractor

Extract URLs from XML sitemaps and sitemap indexes. Get URL, lastmod, changefreq, priority and source sitemap.

Ben

Sitemap Scraper

pvillalva/sitemap-scraper

The Sitemap Scraper extracts and outputs all URLs from a given sitemap.

Percival Villalva

273

Sitemap Url Extractor

scrapers-hub/sitemap-url-extractor

Sitemap URL extractor to extract all URLs from XML sitemaps quickly and efficiently 🌐📄 Ideal for SEO audits, site analysis, and indexing workflows. Fast, accurate, and easy to use.

Scrapers Hub

Sitemap to URL Crawler — Extract Sitemap.xml URLs

logiover/sitemap-to-url-crawler

Extract all URLs from any sitemap.xml recursively. Export sitemap URLs to CSV/JSON for RAG pipelines, SEO audits, and LLM training datasets.

Logiover

Sitemap API

vivid_astronaut/sitemap

Fabio Suizu

Sitemap Scraper

scrapedrift/sitemap-scraper

Sitemap Scraper extracts URLs, pages, images, and structured data from XML sitemaps. Quickly discover website content, audit site structure, monitor updates, support SEO analysis, conduct competitor research, and gather valuable website data at scale.

ScrapeDrift

Sitemap Extractor: Website → All URLs (sitemap.xml parser)

boxbox10/sitemap-extractor

Give it a website. Get every URL from its sitemap — loc, lastmod, changefreq, priority — as one clean record per URL. Auto-discovers sitemap.xml, robots.txt Sitemap: directives, and nested sitemap indexes. Perfect for SEO audits, crawl seeding, and URL discovery.

Marvin Eguilos

Sitemap Generator - Crawl Website & Create XML Sitemap

scrappy_garden/sitemap-generator

Generate an XML sitemap for any website. Crawls internal pages from start URLs (with depth + page limits), deduplicates URLs, and stores a ready-to-submit sitemap.xml plus a structured dataset and summary for SEO audits.

Bikram Adhikari

Sitemap Detector

coder_zoro/sitemap-detector

Find sitemap URLs fast with our free Sitemap Finder tool. Instantly detect sitemaps from any website for SEO audits, indexing checks, and crawl planning. Improve visibility, site structure insights, and search engine performance in just seconds

Zoro

180

5.0

XML Sitemap Scraper & URL Extractor API - SEO Crawler

pink_comic/sitemap-url-extractor

Extract URLs from XML sitemaps and robots.txt for SEO crawls, audits, content migrations, and RAG indexing. Auto-discovers sitemap files, parses nested sitemap indexes, and exports URL, lastmod, priority, changefreq, and image metadata in bulk.

Ava Torres

Sitemap Scraper

Sitemap Scraper ⚡

See the Data: Sample Output

Setting It Up

What It Does

Extract at Scale with Sitemap Scraper

Works for Sitemap Indexes (Not Just Simple Sitemaps)

Clean URL Output for SEO Audits

Resilient Fetching with Retries

URL Extraction from Public Web Data

Why Sitemap Scraper?

Handles Sitemap Indexes Automatically

Ready-to-Analyze Output

Built for Bulk, Not Busywork

Real-World Use Cases

How to Run It

Export & Integration Options

Pricing

Reliability & Limitations

Frequently Asked Questions

Is there a free plan?

Do I need to log in or create an account on Apify to use this?

How accurate is the extracted data?

How many results can I get per run?

How fresh is the data?

Is this legal? Does it comply with GDPR / CCPA?

Can I export to Google Sheets or Excel?

Can I schedule this to run automatically?

Can I access results via the API?

What happens when the actor encounters an error?

Get Help & Use Responsibly

You might also like

Sitemap URL Extractor - XML Sitemap Scraper

Sitemap Scraper

Sitemap Url Extractor

Sitemap to URL Crawler — Extract Sitemap.xml URLs

Sitemap API

Sitemap Scraper

Sitemap Extractor: Website → All URLs (sitemap.xml parser)

Sitemap Generator - Crawl Website & Create XML Sitemap

Sitemap Detector

XML Sitemap Scraper & URL Extractor API - SEO Crawler