Sitemap Url Extractor
Pricing
from $2.99 / 1,000 results
Sitemap Url Extractor
🔎 Extract URLs from any sitemap fast and accurately. Sitemap Url Extractor helps you discover, audit, and optimize website links for SEO, crawling, and migrations—ideal for webmasters, marketers, and developers. 🚀⚙️
Pricing
from $2.99 / 1,000 results
Rating
0.0
(0)
Developer
SolidScraper
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
8 days ago
Last modified
Categories
Share
Sitemap URL Extractor 🔍
Sitemap URL Extractor automatically extracts URLs from a sitemap (including sitemap indexes) and saves them to an Apify dataset. Whether you’re doing website research, SEO auditing, or building a bulk URL list from an existing sitemap, this sitemap url extractor tool turns a single root_sitemap_url into a structured output you can use at scale—saving you hours of manual work.
Why choose Sitemap URL Extractor?
| Feature | Benefit |
|---|---|
| ✅ All-in-one sitemap parsing | Extracts URLs from both direct sitemaps and sitemap indexes (recursively) |
| ✅ Reliability-first fetching | Includes residential proxy support for more dependable data collection |
| ✅ Structured output saving | Writes extracted records directly to the output dataset as they’re collected |
| ✅ URL-focused results | Produces a clean table of URL, lastmod, and changefreq you can export and analyze |
| ✅ Scales from one sitemap to many | Handles multiple sub-sitemaps when the root is a sitemap index |
| ✅ Easy workflow integration | Output dataset is ready for downstream processing in your pipeline |
Key features
- 🧾 Sitemap URL extraction (urlset): Parses
<urlset>files and extracts each entry’surl,lastmod, andchangefreq - 🗂️ Sitemap index support: Detects
<sitemapindex>files and processes sub-sitemaps to extract sitemap links as well - 🔁 Recursive sitemap parsing: Automatically walks through sitemap indexes to gather URLs from included sitemaps
- 🌐 XML sitemap parsing: Works with standard sitemap XML structures using sitemap URL parsing
- 💾 Live dataset saving: Pushes each extracted URL record to the dataset immediately (so you don’t lose progress)
- 🛡️ Residential proxy support: Designed to support reliable scraping for public web data
- 📦 Simple, analyst-friendly output: Saves fields in a consistent structure for easy export and review
Input
Provide input via an input.json file. Example structure:
{"root_sitemap_url": "https://onescales.com/sitemap.xml"}
Input Fields
| Field | Required | Description |
|---|---|---|
root_sitemap_url | Yes | The URL of the sitemap or sitemap index to start with. This is the root entry point for the sitemap url extractor tool. |
Output
The actor saves each extracted URL record to the Apify dataset as JSON items.
{"url": "https://example.com/page-1","lastmod": "2024-01-15","changefreq": "weekly"}
Output Fields
| Field | Type | Description |
|---|---|---|
url | string | null | The extracted <loc> value for each sitemap entry |
lastmod | string | null | The extracted <lastmod> value (if present in the sitemap) |
changefreq | string | The extracted <changefreq> value; if missing, it defaults to "weekly" |
Note: The output dataset view is configured to display url as a link, and lastmod and changefreq as text.
How to use Sitemap URL Extractor (via Apify Console)
-
Open Apify Console
Log in at https://console.apify.com and go to the Actors tab. -
Find the actor
Search for Sitemap URL Extractor and open the actor page. -
Go to the INPUT section
Use the built-in form (or switch to editinginput.jsondirectly) to provide the required input. -
Set
root_sitemap_url
Paste a direct sitemap URL (XML) or a sitemap index URL. This is what the sitemap url scraper will start from. -
Run the actor
Click Run. During the run, you’ll see logs about fetching and whether it detected a direct urlset or a sitemap index. -
Monitor progress
As the actor processes the sitemap index (if applicable), it extracts sitemap links and pushes results to the dataset. -
Open the OUTPUT dataset
After completion, open the dataset named Sitemap URLs to view the extracted URL records. -
Export your results
Export the dataset to JSON/CSV using Apify’s standard dataset export options (based on what your workflow needs).
No coding required—get URLs from XML sitemap files in minutes with Sitemap URL Extractor. ✅
Advanced features & SEO optimization
- 🔍 Engineered for sitemap url extraction: Built specifically for extract urls from sitemap and pull out sitemap index URL extractor results in one go
- 🗂️ Handles sitemap indexes automatically: Perfect for extracting sitemap links across multiple nested sitemaps
- 📊 Built for website sitemap parsing: Produces a consistent structure ideal for SEO audits and crawling prep using sitemap URL parsing
- 💾 Real-time saving to dataset: Extracted records are pushed as they’re collected, which is helpful for large sites
- 🛡️ Residential proxy support for public web data: Designed to improve reliability when collecting from external hosts
Best use cases
- 📈 SEO teams auditing a website: Build a complete URL list from an XML sitemap to verify coverage and indexation expectations
- 🧭 Content strategists planning site-wide updates: Quickly get URLs, last modification dates, and change frequency signals for prioritization
- 🔎 Digital marketers running large-scale URL research: Create bulk lists for analysis without manually opening sitemap files
- 🧪 Data analysts preparing datasets: Transform sitemap extractor software output into spreadsheets, BI dashboards, or downstream models
- 🌐 Web developers building crawling pipelines: Use extracted URLs from a sitemap index URL extractor step before running your own crawler
- 🧑💻 Engineering teams automating reporting: Incorporate sitemap url finder results into scheduled workflows and exports
Technical specifications
-
Supported Input Formats
- ✅
root_sitemap_urlas a string pointing to a sitemap or sitemap index URL
- ✅
-
Proxy Support
- ✅ Residential proxy support is used to improve reliability when fetching public web data
-
Retry Mechanism
- ⚠️ Not specified in the available actor source metadata
-
Dataset Structure
- ✅ Outputs JSON records with
url,lastmod, andchangefreq
- ✅ Outputs JSON records with
-
Rate Limits & Performance
- ⚠️ Processing speed and limits are not specified in the available actor documentation
-
Limitations
- ⚠️ If the sitemap cannot be fetched or parsed, results may be incomplete (the actor logs errors and stops processing in those cases)
FAQ
Does Sitemap URL Extractor handle both sitemap indexes and direct sitemaps?
✅ Yes. It detects whether the root is a sitemap index or a direct urlset, then extracts accordingly. For sitemap indexes, it fetches and processes sub-sitemaps to extract sitemap links across the full structure.
What does the actor extract from each sitemap entry?
✅ It extracts the entry’s url (from <loc>), lastmod (from <lastmod>, if present), and changefreq (from <changefreq>). If changefreq is missing, it defaults to "weekly".
Where do the results go after the run?
✅ The actor saves extracted items to the Apify dataset configured as Sitemap URLs, with fields url, lastmod, and changefreq.
Do I need to write any code to use this tool?
✅ No. You can provide input via Apify Console and then export the dataset after the actor finishes.
Is this meant for private websites or authenticated pages?
❌ No. This tool is intended for publicly available sitemap XML content. It does not target private, authenticated, or password-protected resources.
Can I export the extracted URLs for use in other tools?
✅ Yes. Since the actor outputs to a dataset, you can export it in standard dataset formats (for example, JSON/CSV) using Apify’s dataset export features.
How do I request a dataset data removal?
If you need data removal for outputs produced by this actor, contact dataforleads@gmail.com.
Support & feature requests
Want to improve your sitemap url extraction workflow with Sitemap URL Extractor? We’d love your feedback. 💡
- 💡 Feature Requests: For example, enhancements like additional sitemap fields, alternate output formats, or more dataset controls would be great additions—tell us what would make this sitemap extractor software fit your pipeline better.
- 📧 Contact: Reach out via dataforleads@gmail.com.
Your input helps shape what we build next for Sitemap URL Extractor.
If you’re looking for an SEO-optimized sitemap url extractor tool that turns XML sitemaps into usable datasets, Sitemap URL Extractor is built for exactly that.
Run it on a sitemap url, index, or both—and extract URLs from sitemap structures at scale with confidence.
Disclaimer
This tool only accesses publicly accessible sources (public sitemap XML). It does not access private profiles, authenticated data, or password-protected pages.
It’s your responsibility to ensure your use complies with applicable laws and regulations (including GDPR and CCPA where relevant), as well as each website’s terms of service and any applicable anti-abuse or rate-limit requirements.
For data removal requests, contact dataforleads@gmail.com. Please use Sitemap URL Extractor responsibly, ethically, and for legitimate purposes only.