Sitemap Generator - Creates sitemap.xml for any domain
Pricing
from $5.00 / 1,000 results
Sitemap Generator - Creates sitemap.xml for any domain
Generate a clean, standards-compliant sitemap.xml for a website. This actor crawls a single website, discovers all indexable pages, and produces: ✅ A ready-to-submit sitemap.xml (Google-compliant) ✅ A structured JSON dataset of discovered URLs (for auditing, reporting, and billing)
Pricing
from $5.00 / 1,000 results
Rating
0.0
(0)
Developer

Chris Xavier
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
5 days ago
Last modified
Categories
Share
🗺️ Sitemap Generator (Apify Actor)
Generate a clean, standards-compliant sitemap.xml for a website — automatically, reliably, and without manual cleanup.
This actor crawls a single website, discovers all indexable pages, and produces:
- ✅ A ready-to-submit
sitemap.xml(Google-compliant) - ✅ A structured JSON dataset of discovered URLs (for auditing, reporting, and billing)
Built for SEO professionals, agencies, and site owners who want accuracy, transparency, and results they can trust.
✅ What This Actor Does
- Crawls one website per run (no mixed domains, no confusion)
- Discovers internal pages by following links
- Excludes junk/system URLs automatically (e.g. Cloudflare, admin endpoints)
- Respects
robots.txt(optional) - Removes duplicate URLs and URL fragments
- Optionally strips query strings to prevent sitemap bloat
- Extracts real
<lastmod>dates when available:- From HTTP
Last-Modifiedheaders - From blog/article meta tags when headers are missing
- From HTTP
- Outputs a fully valid
sitemap.xml
📦 Outputs (Where to Find Your Files)
Run → Storage → Key-value store → sitemap.xml
This file is:
- Ready to upload to Google Search Console
- Ready to host at
/sitemap.xml - Standards-compliant (no reconstruction required)
🟢 JSON Results (Dataset)
Every discovered page is also saved to the Dataset.
Each row includes:
url– discovered page URLdepth– crawl depth from the homepagelastmod– modification date (when available)lastmodSource–"header","meta", ornull
This dataset is useful for:
- Auditing and QA
- URL counts and reporting
- Monetization and billing logic
- Previewing results before download
🔒 Important Design Decisions (On Purpose)
One Website per Run
This actor enforces a single start URL.
Why?
- A sitemap must not mix domains
- One site = one sitemap = one clean result
- Prevents invalid or rejected sitemaps
- Enables clear pricing per site
Honest <lastmod> Values
The actor does not fake modification dates.
- Uses real server headers when available
- Falls back to article metadata for blog posts
- Omits
<lastmod>when no trustworthy source exists
This avoids misleading search engines and protects SEO integrity.
⚙️ Inputs
Required
- Start URL
The root URL of the website (example:https://example.com)
Optional
- Max crawl depth
- Max number of pages
- Concurrency
- Headless browser (for JavaScript-heavy sites)
- Strip query strings
- Respect
robots.txt - Advanced include/exclude URL patterns (regex)
Most users can run the actor with just a Start URL.
🧠 Who This Is For
- SEO professionals
- Agencies managing multiple client sites
- Developers who need clean sitemaps programmatically
- Site owners preparing for Google Search Console
- AI-first websites optimizing crawlability
💡 Why Use This Actor Instead of Online Sitemap Tools?
- No URL limits
- No fake results
- No mixed domains
- No guessing which pages were included
- Full transparency (XML + JSON)
- Automation-ready and API-friendly
🔐 PPE (Paid / Private / Enterprise)
This actor is designed for PPE use:
- Consistent, auditable outputs
- Dataset always populated (even if XML is downloaded)
- Clear value per run
- Suitable for client-facing and internal workflows
Run it. Download sitemap.xml. Submit. Done.
🟢 sitemap.xml (Primary Output)
Your sitemap is written as a real XML file.
Location in Apify UI: