Sitemap Generator - Crawl Website & Create XML Sitemap avatar
Sitemap Generator - Crawl Website & Create XML Sitemap

Pricing

Pay per usage

Go to Apify Store
Sitemap Generator - Crawl Website & Create XML Sitemap

Sitemap Generator - Crawl Website & Create XML Sitemap

Generate an XML sitemap for any website. Crawls internal pages from start URLs (with depth + page limits), deduplicates URLs, and stores a ready-to-submit sitemap.xml plus a structured dataset and summary for SEO audits.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Bikram Adhikari

Bikram Adhikari

Maintained by Community

Actor stats

0

Bookmarked

5

Total users

2

Monthly active users

6 days ago

Last modified

Categories

Share

Generate an XML sitemap (sitemap.xml) for any website by crawling internal pages from one or more start URLs.

This Actor is designed for:

  • SEO audits (discover missing pages)
  • Creating/refreshing sitemaps for search engines
  • QA / monitoring of site URL coverage

What it does

  • Crawls internal links (same hostname as the provided start URLs)
  • Deduplicates URLs
  • Stores sitemap.xml in the default key-value store
  • If the site has more than 50,000 discovered URLs, it creates multiple sitemap-*.xml parts plus a sitemap-index.xml (and sitemap.xml will contain the index for compatibility)
  • Writes a dataset item for each crawled page (included/excluded + reason)
  • Writes a SUMMARY JSON report (counts, settings, sitemap URL count)

Input

  • startUrls (required): Start URLs (request list)
  • maxPages: Max pages to crawl (limits total requests)
  • maxDepth: Max link depth from the start URLs
  • ignoreUrlPatterns: Array of regex strings to exclude URLs
  • includeQueryParams: Include ?query=params in sitemap URLs
  • includeFragments: Include #fragments in sitemap URLs (usually disabled)
  • includeLastModified: If enabled, uses the HTTP Last-Modified header for <lastmod> when available
  • respectRobotsTxt: If enabled, skips URLs disallowed by robots.txt for User-agent: * (best-effort)
  • robotsTxtTimeoutSecs: Timeout for downloading robots.txt
  • changefreq, priority: Optional sitemap hints applied to all URLs

Output

Key-value store

  • sitemap.xml (XML)
  • sitemap-index.xml (XML, only for large sites)
  • sitemap-1.xml, sitemap-2.xml, ... (XML parts, only for large sites)
  • SUMMARY (JSON)

Dataset

Each item contains:

  • url, normalizedUrl, statusCode, contentType
  • depth, discoveredFrom
  • includedInSitemap, exclusionReason
  • lastModified, crawledAt

SEO keywords

sitemap generator, xml sitemap generator, website sitemap crawler, generate sitemap.xml, seo sitemap tool, internal link crawler

Quick start

Store page: https://apify.com/scrappy_garden/sitemap-generator

Paste this into Input and click Run:

{
"startUrls": [
{
"url": "https://example.com/"
}
],
"proxyConfiguration": {
"useApifyProxy": false
}
}

Outputs (what you get)

  • Dataset: Dataset items typically include fields like: url, statusCode, includedInSitemap, exclusionReason, depth, lastModified, crawledAt.
  • Key-value store: SUMMARY, sitemap.xml

Tips (trust + predictable results)

  • Start with 1–3 URLs to validate behavior, then scale up.
  • If a target blocks requests, enable Proxy and/or slow down concurrency in Input.
  • Use the SUMMARY / REPORT keys (when present) for automation pipelines and monitoring.

Search keywords

sitemap generator, sitemap generator - crawl website & create xml sitemap, website audit, seo, sitemap