Sitemap Url Extractor avatar

Sitemap Url Extractor

Pricing

from $2.99 / 1,000 results

Go to Apify Store
Sitemap Url Extractor

Sitemap Url Extractor

🔎 Extract URLs from any sitemap fast and accurately. Sitemap Url Extractor helps you discover, audit, and optimize website links for SEO, crawling, and migrations—ideal for webmasters, marketers, and developers. 🚀⚙️

Pricing

from $2.99 / 1,000 results

Rating

0.0

(0)

Developer

SolidScraper

SolidScraper

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

8 days ago

Last modified

Share

Sitemap URL Extractor 🔍

Sitemap URL Extractor automatically extracts URLs from a sitemap (including sitemap indexes) and saves them to an Apify dataset. Whether you’re doing website research, SEO auditing, or building a bulk URL list from an existing sitemap, this sitemap url extractor tool turns a single root_sitemap_url into a structured output you can use at scale—saving you hours of manual work.


Why choose Sitemap URL Extractor?

FeatureBenefit
All-in-one sitemap parsingExtracts URLs from both direct sitemaps and sitemap indexes (recursively)
Reliability-first fetchingIncludes residential proxy support for more dependable data collection
Structured output savingWrites extracted records directly to the output dataset as they’re collected
URL-focused resultsProduces a clean table of URL, lastmod, and changefreq you can export and analyze
Scales from one sitemap to manyHandles multiple sub-sitemaps when the root is a sitemap index
Easy workflow integrationOutput dataset is ready for downstream processing in your pipeline

Key features

  • 🧾 Sitemap URL extraction (urlset): Parses <urlset> files and extracts each entry’s url, lastmod, and changefreq
  • 🗂️ Sitemap index support: Detects <sitemapindex> files and processes sub-sitemaps to extract sitemap links as well
  • 🔁 Recursive sitemap parsing: Automatically walks through sitemap indexes to gather URLs from included sitemaps
  • 🌐 XML sitemap parsing: Works with standard sitemap XML structures using sitemap URL parsing
  • 💾 Live dataset saving: Pushes each extracted URL record to the dataset immediately (so you don’t lose progress)
  • 🛡️ Residential proxy support: Designed to support reliable scraping for public web data
  • 📦 Simple, analyst-friendly output: Saves fields in a consistent structure for easy export and review

Input

Provide input via an input.json file. Example structure:

{
"root_sitemap_url": "https://onescales.com/sitemap.xml"
}

Input Fields

FieldRequiredDescription
root_sitemap_urlYesThe URL of the sitemap or sitemap index to start with. This is the root entry point for the sitemap url extractor tool.

Output

The actor saves each extracted URL record to the Apify dataset as JSON items.

{
"url": "https://example.com/page-1",
"lastmod": "2024-01-15",
"changefreq": "weekly"
}

Output Fields

FieldTypeDescription
urlstring | nullThe extracted <loc> value for each sitemap entry
lastmodstring | nullThe extracted <lastmod> value (if present in the sitemap)
changefreqstringThe extracted <changefreq> value; if missing, it defaults to "weekly"

Note: The output dataset view is configured to display url as a link, and lastmod and changefreq as text.


How to use Sitemap URL Extractor (via Apify Console)

  1. Open Apify Console
    Log in at https://console.apify.com and go to the Actors tab.

  2. Find the actor
    Search for Sitemap URL Extractor and open the actor page.

  3. Go to the INPUT section
    Use the built-in form (or switch to editing input.json directly) to provide the required input.

  4. Set root_sitemap_url
    Paste a direct sitemap URL (XML) or a sitemap index URL. This is what the sitemap url scraper will start from.

  5. Run the actor
    Click Run. During the run, you’ll see logs about fetching and whether it detected a direct urlset or a sitemap index.

  6. Monitor progress
    As the actor processes the sitemap index (if applicable), it extracts sitemap links and pushes results to the dataset.

  7. Open the OUTPUT dataset
    After completion, open the dataset named Sitemap URLs to view the extracted URL records.

  8. Export your results
    Export the dataset to JSON/CSV using Apify’s standard dataset export options (based on what your workflow needs).

No coding required—get URLs from XML sitemap files in minutes with Sitemap URL Extractor. ✅


Advanced features & SEO optimization

  • 🔍 Engineered for sitemap url extraction: Built specifically for extract urls from sitemap and pull out sitemap index URL extractor results in one go
  • 🗂️ Handles sitemap indexes automatically: Perfect for extracting sitemap links across multiple nested sitemaps
  • 📊 Built for website sitemap parsing: Produces a consistent structure ideal for SEO audits and crawling prep using sitemap URL parsing
  • 💾 Real-time saving to dataset: Extracted records are pushed as they’re collected, which is helpful for large sites
  • 🛡️ Residential proxy support for public web data: Designed to improve reliability when collecting from external hosts

Best use cases

  • 📈 SEO teams auditing a website: Build a complete URL list from an XML sitemap to verify coverage and indexation expectations
  • 🧭 Content strategists planning site-wide updates: Quickly get URLs, last modification dates, and change frequency signals for prioritization
  • 🔎 Digital marketers running large-scale URL research: Create bulk lists for analysis without manually opening sitemap files
  • 🧪 Data analysts preparing datasets: Transform sitemap extractor software output into spreadsheets, BI dashboards, or downstream models
  • 🌐 Web developers building crawling pipelines: Use extracted URLs from a sitemap index URL extractor step before running your own crawler
  • 🧑‍💻 Engineering teams automating reporting: Incorporate sitemap url finder results into scheduled workflows and exports

Technical specifications

  • Supported Input Formats

    • root_sitemap_url as a string pointing to a sitemap or sitemap index URL
  • Proxy Support

    • ✅ Residential proxy support is used to improve reliability when fetching public web data
  • Retry Mechanism

    • ⚠️ Not specified in the available actor source metadata
  • Dataset Structure

    • ✅ Outputs JSON records with url, lastmod, and changefreq
  • Rate Limits & Performance

    • ⚠️ Processing speed and limits are not specified in the available actor documentation
  • Limitations

    • ⚠️ If the sitemap cannot be fetched or parsed, results may be incomplete (the actor logs errors and stops processing in those cases)

FAQ

Does Sitemap URL Extractor handle both sitemap indexes and direct sitemaps?

✅ Yes. It detects whether the root is a sitemap index or a direct urlset, then extracts accordingly. For sitemap indexes, it fetches and processes sub-sitemaps to extract sitemap links across the full structure.

What does the actor extract from each sitemap entry?

✅ It extracts the entry’s url (from <loc>), lastmod (from <lastmod>, if present), and changefreq (from <changefreq>). If changefreq is missing, it defaults to "weekly".

Where do the results go after the run?

✅ The actor saves extracted items to the Apify dataset configured as Sitemap URLs, with fields url, lastmod, and changefreq.

Do I need to write any code to use this tool?

✅ No. You can provide input via Apify Console and then export the dataset after the actor finishes.

Is this meant for private websites or authenticated pages?

❌ No. This tool is intended for publicly available sitemap XML content. It does not target private, authenticated, or password-protected resources.

Can I export the extracted URLs for use in other tools?

✅ Yes. Since the actor outputs to a dataset, you can export it in standard dataset formats (for example, JSON/CSV) using Apify’s dataset export features.

How do I request a dataset data removal?

If you need data removal for outputs produced by this actor, contact dataforleads@gmail.com.


Support & feature requests

Want to improve your sitemap url extraction workflow with Sitemap URL Extractor? We’d love your feedback. 💡

  • 💡 Feature Requests: For example, enhancements like additional sitemap fields, alternate output formats, or more dataset controls would be great additions—tell us what would make this sitemap extractor software fit your pipeline better.
  • 📧 Contact: Reach out via dataforleads@gmail.com.

Your input helps shape what we build next for Sitemap URL Extractor.


If you’re looking for an SEO-optimized sitemap url extractor tool that turns XML sitemaps into usable datasets, Sitemap URL Extractor is built for exactly that.
Run it on a sitemap url, index, or both—and extract URLs from sitemap structures at scale with confidence.


Disclaimer

This tool only accesses publicly accessible sources (public sitemap XML). It does not access private profiles, authenticated data, or password-protected pages.

It’s your responsibility to ensure your use complies with applicable laws and regulations (including GDPR and CCPA where relevant), as well as each website’s terms of service and any applicable anti-abuse or rate-limit requirements.

For data removal requests, contact dataforleads@gmail.com. Please use Sitemap URL Extractor responsibly, ethically, and for legitimate purposes only.