Sitemap Scraper avatar

Sitemap Scraper

Pricing

from $0.01 / 1,000 results

Go to Apify Store
Sitemap Scraper

Sitemap Scraper

🔎 Sitemap Scraper extracts URLs from XML sitemaps fast and accurately. 🚀 Perfect for SEO audits, link building, content discovery, and crawling planning. 📈 Get organized site maps in minutes—save time, boost rankings!

Pricing

from $0.01 / 1,000 results

Rating

0.0

(0)

Developer

Scraperoka

Scraperoka

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

Sitemap Scraper ⚡

Manually hunting down every page URL across a website takes hours and often misses important sections. Sitemap Scraper extracts all URLs from a sitemap (including sitemap indexes) and saves them to an Apify dataset—perfect for marketers, SEO specialists, and researchers who want sitemap scraper results in bulk fast. Use this Sitemap Scraper (also great as a website sitemap scraper and xml sitemap scraper) to turn sitemap parsing into a repeatable workflow that can produce thousands of extracted URLs in a single run.


What You Get: Sample Output

Here's a sample record from a single run:

{
"url": "https://example.com/blog/technical-seo-checklist",
"lastMod": "2025-05-21"
}
FieldTypeWhat It Tells You
urlstringThe extracted page URL from the sitemap (including sitemap indexes)
lastModstring | nullThe sitemap’s lastmod date (YYYY-MM-DD format) when available
success(not present in output)No extra success flag is added per record by this actor
error_message(not present in output)Errors are logged during the run; records pushed contain url and lastMod only
charged_event_name(not present in output)The actor pushes extracted URL batches to charged_event_name="result"

Export your dataset as JSON, CSV, or Excel — straight from the Apify dashboard.


Why Sitemap Scraper?

There are a lot of ways to pull data from sitemaps—here’s what sets Sitemap Scraper apart for website sitemap scraper workflows and xml sitemap scraper needs.

Handles sitemap indexes automatically

If your sitemap is an index that points to many sub-sitemaps, Sitemap Scraper recursively fetches and parses them. That means you can feed in a single entry and still get complete coverage for sitemap url extractor use cases.

Extracts clean URL records from urlset

For regular sitemap files, it extracts each <loc> as a URL and captures lastmod when present. This makes it a practical sitemap parsing tool for building SEO lists like “all URLs for content audit” and “competitor sitemap scraper” style research.

Resilient fetching with retries

When a sitemap request fails, the actor includes retries and backs off between attempts to improve reliability. This helps when hosting servers throttle or intermittently block requests while you’re running bulk sitemap URL extraction jobs.

Output is written in batches for efficiency

Extracted URL records are pushed to the dataset in batches for faster processing during larger runs. The result is smoother execution when you’re using Sitemap Scraper for sitemap link extraction at scale.


Configuring Your Run

Drop this into your input.json to get started:

{
"startUrls": [
{ "url": "https://example.com/sitemap.xml" },
{ "url": "https://example.com/sitemap_index.xml" }
]
}
ParameterRequiredWhat It Does
startUrlsList of sitemap URLs to crawl (supports both sitemap files and sitemap indexes)
startUrls[].urlThe actual sitemap URL to fetch and parse

Note: The actor also reads proxyConfiguration from the run input (if you provide it). If proxy settings are present, it will use them to fetch sitemaps; otherwise it runs without proxy support.


Core Capabilities

Sitemap crawling for complete URL coverage

Sitemap Scraper fetches your provided sitemap URLs and parses the XML to extract URLs. If a sitemap is a sitemap index, it follows through to the underlying sub-sitemaps to find all URLs.

URL extraction with optional lastmod

For each URL entry, it outputs url and, when available, a lastMod value derived from the sitemap’s lastmod. This is useful when you’re building datasets for SEO prioritization with sitemap data extraction in mind.

Recursive sitemap parsing

Sitemap Scraper recursively handles both sitemap index structures and standard URL sets. That makes it well-suited for “extract urls from sitemap” workflows that need consistent results regardless of sitemap format.

Resilience for real-world endpoints

It includes retry logic (up to 3 attempts) and uses exponential backoff for improved resilience. This helps keep a long sitemap scraper chrome extension-style workflow stable when endpoints are temporarily unavailable or rate-limited.

Dataset-ready output for automation

Extracted results are pushed into your Apify dataset as they’re parsed. You can then connect the output to your downstream pipeline for reporting, auditing, or research without manual copying.


Who Gets the Most Out of This

Sitemap Scraper is ideal for SEO specialists who need a reliable sitemap scraper for SEO workflow to audit what a site actually publishes. It’s also a strong fit for competitive research teams running a competitor sitemap scraper process—building URL datasets faster than manual browsing.

Marketing and growth analysts use this xml sitemap scraper output to segment content catalogs, estimate crawl scope, and validate campaign landing pages. Data researchers benefit from extracting find all URLs in sitemap style datasets with consistent fields (url and lastMod) for analysis and downstream enrichment.

If you’re an automation-focused technical user, Sitemap Scraper works as a clean “URL ingestion” step in a larger pipeline, turning sitemap parsing into a repeatable job you can trigger and export programmatically.


Step-by-Step: How to Use It

No coding needed. Here's how to run Sitemap Scraper from start to finish:

  1. Open the actor on Apify — go to console.apify.com and search for Sitemap Scraper.
  2. Enter your inputs — provide your sitemap(s) in startUrls using the url values from your own site.
  3. Configure proxy settings (optional) — if your environment needs it, set the run’s proxy configuration options.
  4. Hit Run and watch the live log — confirm it’s fetching and parsing your sitemap(s).
  5. View results in the dataset tab — you’ll see extracted URL records as the actor pushes them.
  6. Export as JSON, CSV, or Excel — download your dataset directly from the Apify dashboard.

The whole process takes under 5 minutes to set up.


Integrations & Export Options

Once your data is collected, Sitemap Scraper plugs directly into your existing workflow.

You can export your Apify dataset from the dashboard in common formats like JSON, CSV, or Excel, which makes extract urls from sitemap outputs easy to share with stakeholders.

You can also access the results via the Apify API for programmatic pipelines, and use webhooks and automation tools (such as Zapier or Make) to trigger downstream actions when runs complete. For setup details, refer to the Apify documentation at https://apify.com/docs/api.

For recurring workflows (for example, frequent sitemap checks), schedule the actor to run automatically on a cron schedule through Apify.


Pricing & Free Trial

Sitemap Scraper runs on the Apify platform, which offers a free tier — no credit card required to get started.

Apify provides initial free platform credits on sign-up, which is typically enough for several test runs. For production usage, billing is generally based on Apify platform compute (CU), and you can choose from Apify’s available starter/scale plans depending on your workload. Start for free at apify.com and scale when you're ready.


Reliability & Performance

What We HandleHow
Rate-limited / blocked sitemap requestsRetries and backoff to improve fetch success
Proxy needsOptional proxy support if you configure it in your run input
Large sitemap indexesRecursive parsing to reach all sub-sitemaps
Error resilienceFailures during fetch or parse are logged so you can inspect run logs
Output readinessExtracted URLs are pushed to your dataset for immediate use

Limitations: If a sitemap endpoint is inaccessible or returns invalid/unparseable XML, extraction can be incomplete. Sitemap Scraper only extracts what’s present in the provided sitemap files; it cannot invent URLs that aren’t listed.

For enterprise-scale runs, contact us to discuss custom configurations.


Frequently Asked Questions

Is there a free plan or trial?

Yes—Apify offers a free tier so you can test Sitemap Scraper without needing a credit card.

Do I need to log in to use Sitemap Scraper?

No. Sitemap Scraper only fetches and parses sitemap content from the sitemap URLs you provide.

How accurate is the data?

The output is as accurate as the XML in the sitemap. It extracts url values from the sitemap entries and includes lastMod when the sitemap provides a lastmod.

How many results can I get per run?

You can typically extract many URLs per run, depending on how large the provided sitemaps are and what the host server allows during your job window.

How often is the data updated / how fresh is it?

Freshness depends on when you run the actor. The extracted data includes lastMod values from the sitemap, but the actor only reflects what’s available at the time of fetching.

Sitemap Scraper works with publicly available data from sitemaps. You’re responsible for ensuring your use complies with applicable regulations (including GDPR/CCPA) and the website’s terms for accessing and using that information.

Can I export results to Google Sheets or Excel?

Yes. You can export your Apify dataset from the dashboard in formats like JSON and CSV, and import into tools like Excel or set up integrations for spreadsheets.

Can I run this on a schedule automatically?

Yes. You can schedule Apify actor runs on a cron schedule so your sitemap parsing happens automatically at whatever frequency you choose.

Can I access this via API?

Yes. You can use the Apify API to trigger runs and retrieve results programmatically. See https://apify.com/docs/api for details.

What happens if the actor hits an error?

If a sitemap fetch fails, the actor logs the failure and retries with backoff. Parsing errors are also logged, and whatever URLs can be extracted will still be pushed to the dataset.


Need Help or Have a Request?

Got a question about Sitemap Scraper or want a new feature added? Reach out at dataforleads@gmail.com. We welcome requests like enhanced export options and webhook notifications on completion. We actively maintain this actor based on user feedback.


Disclaimer & Responsible Use

Sitemap Scraper is the fastest, most reliable way to extract URLs from sitemaps—start your free run today.

Sitemap Scraper uses publicly available data from the sitemap URLs you provide. It does not access private accounts, login-gated content, or password-protected pages. You are responsible for complying with GDPR, CCPA, and any relevant platform terms. For data-removal requests, contact dataforleads@gmail.com. Use responsibly, ethically, and only for lawful purposes.