Sitemap Generator - Creates sitemap.xml for any domain avatar
Sitemap Generator - Creates sitemap.xml for any domain

Pricing

from $5.00 / 1,000 results

Go to Apify Store
Sitemap Generator - Creates sitemap.xml for any domain

Sitemap Generator - Creates sitemap.xml for any domain

Generate a clean, standards-compliant sitemap.xml for a website. This actor crawls a single website, discovers all indexable pages, and produces: ✅ A ready-to-submit sitemap.xml (Google-compliant) ✅ A structured JSON dataset of discovered URLs (for auditing, reporting, and billing)

Pricing

from $5.00 / 1,000 results

Rating

0.0

(0)

Developer

Chris Xavier

Chris Xavier

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

5 days ago

Last modified

Share

🗺️ Sitemap Generator (Apify Actor)

Generate a clean, standards-compliant sitemap.xml for a website — automatically, reliably, and without manual cleanup.

This actor crawls a single website, discovers all indexable pages, and produces:

  • ✅ A ready-to-submit sitemap.xml (Google-compliant)
  • ✅ A structured JSON dataset of discovered URLs (for auditing, reporting, and billing)

Built for SEO professionals, agencies, and site owners who want accuracy, transparency, and results they can trust.

✅ What This Actor Does

  • Crawls one website per run (no mixed domains, no confusion)
  • Discovers internal pages by following links
  • Excludes junk/system URLs automatically (e.g. Cloudflare, admin endpoints)
  • Respects robots.txt (optional)
  • Removes duplicate URLs and URL fragments
  • Optionally strips query strings to prevent sitemap bloat
  • Extracts real <lastmod> dates when available:
    • From HTTP Last-Modified headers
    • From blog/article meta tags when headers are missing
  • Outputs a fully valid sitemap.xml

📦 Outputs (Where to Find Your Files)

Run → Storage → Key-value store → sitemap.xml

This file is:

  • Ready to upload to Google Search Console
  • Ready to host at /sitemap.xml
  • Standards-compliant (no reconstruction required)

🟢 JSON Results (Dataset)

Every discovered page is also saved to the Dataset.

Each row includes:

  • url – discovered page URL
  • depth – crawl depth from the homepage
  • lastmod – modification date (when available)
  • lastmodSource"header", "meta", or null

This dataset is useful for:

  • Auditing and QA
  • URL counts and reporting
  • Monetization and billing logic
  • Previewing results before download

🔒 Important Design Decisions (On Purpose)

One Website per Run

This actor enforces a single start URL.

Why?

  • A sitemap must not mix domains
  • One site = one sitemap = one clean result
  • Prevents invalid or rejected sitemaps
  • Enables clear pricing per site

Honest <lastmod> Values

The actor does not fake modification dates.

  • Uses real server headers when available
  • Falls back to article metadata for blog posts
  • Omits <lastmod> when no trustworthy source exists

This avoids misleading search engines and protects SEO integrity.

⚙️ Inputs

Required

  • Start URL
    The root URL of the website (example: https://example.com)

Optional

  • Max crawl depth
  • Max number of pages
  • Concurrency
  • Headless browser (for JavaScript-heavy sites)
  • Strip query strings
  • Respect robots.txt
  • Advanced include/exclude URL patterns (regex)

Most users can run the actor with just a Start URL.

🧠 Who This Is For

  • SEO professionals
  • Agencies managing multiple client sites
  • Developers who need clean sitemaps programmatically
  • Site owners preparing for Google Search Console
  • AI-first websites optimizing crawlability

💡 Why Use This Actor Instead of Online Sitemap Tools?

  • No URL limits
  • No fake results
  • No mixed domains
  • No guessing which pages were included
  • Full transparency (XML + JSON)
  • Automation-ready and API-friendly

🔐 PPE (Paid / Private / Enterprise)

This actor is designed for PPE use:

  • Consistent, auditable outputs
  • Dataset always populated (even if XML is downloaded)
  • Clear value per run
  • Suitable for client-facing and internal workflows

Run it. Download sitemap.xml. Submit. Done.

🟢 sitemap.xml (Primary Output)

Your sitemap is written as a real XML file.

Location in Apify UI: