Sitemap Scraper
Pricing
from $0.01 / 1,000 results
Sitemap Scraper
🔎 Sitemap Scraper extracts URLs from XML sitemaps fast and accurately. 🚀 Perfect for SEO audits, link building, content discovery, and crawling planning. 📈 Get organized site maps in minutes—save time, boost rankings!
Pricing
from $0.01 / 1,000 results
Rating
0.0
(0)
Developer
Scraperoka
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Sitemap Scraper ⚡
Manually hunting down every page URL across a website takes hours and often misses important sections. Sitemap Scraper extracts all URLs from a sitemap (including sitemap indexes) and saves them to an Apify dataset—perfect for marketers, SEO specialists, and researchers who want sitemap scraper results in bulk fast. Use this Sitemap Scraper (also great as a website sitemap scraper and xml sitemap scraper) to turn sitemap parsing into a repeatable workflow that can produce thousands of extracted URLs in a single run.
What You Get: Sample Output
Here's a sample record from a single run:
{"url": "https://example.com/blog/technical-seo-checklist","lastMod": "2025-05-21"}
| Field | Type | What It Tells You |
|---|---|---|
url | string | The extracted page URL from the sitemap (including sitemap indexes) |
lastMod | string | null | The sitemap’s lastmod date (YYYY-MM-DD format) when available |
success | (not present in output) | No extra success flag is added per record by this actor |
error_message | (not present in output) | Errors are logged during the run; records pushed contain url and lastMod only |
charged_event_name | (not present in output) | The actor pushes extracted URL batches to charged_event_name="result" |
Export your dataset as JSON, CSV, or Excel — straight from the Apify dashboard.
Why Sitemap Scraper?
There are a lot of ways to pull data from sitemaps—here’s what sets Sitemap Scraper apart for website sitemap scraper workflows and xml sitemap scraper needs.
Handles sitemap indexes automatically
If your sitemap is an index that points to many sub-sitemaps, Sitemap Scraper recursively fetches and parses them. That means you can feed in a single entry and still get complete coverage for sitemap url extractor use cases.
Extracts clean URL records from urlset
For regular sitemap files, it extracts each <loc> as a URL and captures lastmod when present. This makes it a practical sitemap parsing tool for building SEO lists like “all URLs for content audit” and “competitor sitemap scraper” style research.
Resilient fetching with retries
When a sitemap request fails, the actor includes retries and backs off between attempts to improve reliability. This helps when hosting servers throttle or intermittently block requests while you’re running bulk sitemap URL extraction jobs.
Output is written in batches for efficiency
Extracted URL records are pushed to the dataset in batches for faster processing during larger runs. The result is smoother execution when you’re using Sitemap Scraper for sitemap link extraction at scale.
Configuring Your Run
Drop this into your input.json to get started:
{"startUrls": [{ "url": "https://example.com/sitemap.xml" },{ "url": "https://example.com/sitemap_index.xml" }]}
| Parameter | Required | What It Does |
|---|---|---|
startUrls | ✅ | List of sitemap URLs to crawl (supports both sitemap files and sitemap indexes) |
↳ startUrls[].url | ✅ | The actual sitemap URL to fetch and parse |
Note: The actor also reads
proxyConfigurationfrom the run input (if you provide it). If proxy settings are present, it will use them to fetch sitemaps; otherwise it runs without proxy support.
Core Capabilities
Sitemap crawling for complete URL coverage
Sitemap Scraper fetches your provided sitemap URLs and parses the XML to extract URLs. If a sitemap is a sitemap index, it follows through to the underlying sub-sitemaps to find all URLs.
URL extraction with optional lastmod
For each URL entry, it outputs url and, when available, a lastMod value derived from the sitemap’s lastmod. This is useful when you’re building datasets for SEO prioritization with sitemap data extraction in mind.
Recursive sitemap parsing
Sitemap Scraper recursively handles both sitemap index structures and standard URL sets. That makes it well-suited for “extract urls from sitemap” workflows that need consistent results regardless of sitemap format.
Resilience for real-world endpoints
It includes retry logic (up to 3 attempts) and uses exponential backoff for improved resilience. This helps keep a long sitemap scraper chrome extension-style workflow stable when endpoints are temporarily unavailable or rate-limited.
Dataset-ready output for automation
Extracted results are pushed into your Apify dataset as they’re parsed. You can then connect the output to your downstream pipeline for reporting, auditing, or research without manual copying.
Who Gets the Most Out of This
Sitemap Scraper is ideal for SEO specialists who need a reliable sitemap scraper for SEO workflow to audit what a site actually publishes. It’s also a strong fit for competitive research teams running a competitor sitemap scraper process—building URL datasets faster than manual browsing.
Marketing and growth analysts use this xml sitemap scraper output to segment content catalogs, estimate crawl scope, and validate campaign landing pages. Data researchers benefit from extracting find all URLs in sitemap style datasets with consistent fields (url and lastMod) for analysis and downstream enrichment.
If you’re an automation-focused technical user, Sitemap Scraper works as a clean “URL ingestion” step in a larger pipeline, turning sitemap parsing into a repeatable job you can trigger and export programmatically.
Step-by-Step: How to Use It
No coding needed. Here's how to run Sitemap Scraper from start to finish:
- Open the actor on Apify — go to console.apify.com and search for Sitemap Scraper.
- Enter your inputs — provide your sitemap(s) in
startUrlsusing theurlvalues from your own site. - Configure proxy settings (optional) — if your environment needs it, set the run’s proxy configuration options.
- Hit Run and watch the live log — confirm it’s fetching and parsing your sitemap(s).
- View results in the dataset tab — you’ll see extracted URL records as the actor pushes them.
- Export as JSON, CSV, or Excel — download your dataset directly from the Apify dashboard.
The whole process takes under 5 minutes to set up.
Integrations & Export Options
Once your data is collected, Sitemap Scraper plugs directly into your existing workflow.
You can export your Apify dataset from the dashboard in common formats like JSON, CSV, or Excel, which makes extract urls from sitemap outputs easy to share with stakeholders.
You can also access the results via the Apify API for programmatic pipelines, and use webhooks and automation tools (such as Zapier or Make) to trigger downstream actions when runs complete. For setup details, refer to the Apify documentation at https://apify.com/docs/api.
For recurring workflows (for example, frequent sitemap checks), schedule the actor to run automatically on a cron schedule through Apify.
Pricing & Free Trial
Sitemap Scraper runs on the Apify platform, which offers a free tier — no credit card required to get started.
Apify provides initial free platform credits on sign-up, which is typically enough for several test runs. For production usage, billing is generally based on Apify platform compute (CU), and you can choose from Apify’s available starter/scale plans depending on your workload. Start for free at apify.com and scale when you're ready.
Reliability & Performance
| What We Handle | How |
|---|---|
| Rate-limited / blocked sitemap requests | Retries and backoff to improve fetch success |
| Proxy needs | Optional proxy support if you configure it in your run input |
| Large sitemap indexes | Recursive parsing to reach all sub-sitemaps |
| Error resilience | Failures during fetch or parse are logged so you can inspect run logs |
| Output readiness | Extracted URLs are pushed to your dataset for immediate use |
Limitations: If a sitemap endpoint is inaccessible or returns invalid/unparseable XML, extraction can be incomplete. Sitemap Scraper only extracts what’s present in the provided sitemap files; it cannot invent URLs that aren’t listed.
For enterprise-scale runs, contact us to discuss custom configurations.
Frequently Asked Questions
Is there a free plan or trial?
Yes—Apify offers a free tier so you can test Sitemap Scraper without needing a credit card.
Do I need to log in to use Sitemap Scraper?
No. Sitemap Scraper only fetches and parses sitemap content from the sitemap URLs you provide.
How accurate is the data?
The output is as accurate as the XML in the sitemap. It extracts url values from the sitemap entries and includes lastMod when the sitemap provides a lastmod.
How many results can I get per run?
You can typically extract many URLs per run, depending on how large the provided sitemaps are and what the host server allows during your job window.
How often is the data updated / how fresh is it?
Freshness depends on when you run the actor. The extracted data includes lastMod values from the sitemap, but the actor only reflects what’s available at the time of fetching.
Is this legal? Does it comply with GDPR / CCPA?
Sitemap Scraper works with publicly available data from sitemaps. You’re responsible for ensuring your use complies with applicable regulations (including GDPR/CCPA) and the website’s terms for accessing and using that information.
Can I export results to Google Sheets or Excel?
Yes. You can export your Apify dataset from the dashboard in formats like JSON and CSV, and import into tools like Excel or set up integrations for spreadsheets.
Can I run this on a schedule automatically?
Yes. You can schedule Apify actor runs on a cron schedule so your sitemap parsing happens automatically at whatever frequency you choose.
Can I access this via API?
Yes. You can use the Apify API to trigger runs and retrieve results programmatically. See https://apify.com/docs/api for details.
What happens if the actor hits an error?
If a sitemap fetch fails, the actor logs the failure and retries with backoff. Parsing errors are also logged, and whatever URLs can be extracted will still be pushed to the dataset.
Need Help or Have a Request?
Got a question about Sitemap Scraper or want a new feature added? Reach out at dataforleads@gmail.com. We welcome requests like enhanced export options and webhook notifications on completion. We actively maintain this actor based on user feedback.
Disclaimer & Responsible Use
Sitemap Scraper is the fastest, most reliable way to extract URLs from sitemaps—start your free run today.
Sitemap Scraper uses publicly available data from the sitemap URLs you provide. It does not access private accounts, login-gated content, or password-protected pages. You are responsible for complying with GDPR, CCPA, and any relevant platform terms. For data-removal requests, contact dataforleads@gmail.com. Use responsibly, ethically, and only for lawful purposes.