Sitemap Generator - Crawl Website & Create XML Sitemap avatar
Sitemap Generator - Crawl Website & Create XML Sitemap

Pricing

$4.99/month + usage

Go to Apify Store
Sitemap Generator - Crawl Website & Create XML Sitemap

Sitemap Generator - Crawl Website & Create XML Sitemap

Generate an XML sitemap for any website. Crawls internal pages from start URLs (with depth + page limits), deduplicates URLs, and stores a ready-to-submit sitemap.xml plus a structured dataset and summary for SEO audits.

Pricing

$4.99/month + usage

Rating

0.0

(0)

Developer

Bikram Adhikari

Bikram Adhikari

Maintained by Community

Actor stats

0

Bookmarked

5

Total users

2

Monthly active users

16 days ago

Last modified

Categories

Share

Generate an XML sitemap (sitemap.xml) for any website by crawling internal pages from one or more start URLs.

This Actor is designed for:

  • SEO audits (discover missing pages)
  • Creating/refreshing sitemaps for search engines
  • QA / monitoring of site URL coverage

What it does

  • Crawls internal links (same hostname as the provided start URLs)
  • Deduplicates URLs
  • Stores sitemap.xml in the default key-value store
  • If the site has more than 50,000 discovered URLs, it creates multiple sitemap-*.xml parts plus a sitemap-index.xml (and sitemap.xml will contain the index for compatibility)
  • Writes a dataset item for each crawled page (included/excluded + reason)
  • Writes a SUMMARY JSON report (counts, settings, sitemap URL count)

Input

  • startUrls (required): Start URLs (request list)
  • maxPages: Max pages to crawl (limits total requests)
  • maxDepth: Max link depth from the start URLs
  • ignoreUrlPatterns: Array of regex strings to exclude URLs
  • includeQueryParams: Include ?query=params in sitemap URLs
  • includeFragments: Include #fragments in sitemap URLs (usually disabled)
  • includeLastModified: If enabled, uses the HTTP Last-Modified header for <lastmod> when available
  • respectRobotsTxt: If enabled, skips URLs disallowed by robots.txt for User-agent: * (best-effort)
  • robotsTxtTimeoutSecs: Timeout for downloading robots.txt
  • changefreq, priority: Optional sitemap hints applied to all URLs

Output

Key-value store

  • sitemap.xml (XML)
  • sitemap-index.xml (XML, only for large sites)
  • sitemap-1.xml, sitemap-2.xml, ... (XML parts, only for large sites)
  • SUMMARY (JSON)

Dataset

Each item contains:

  • url, normalizedUrl, statusCode, contentType
  • depth, discoveredFrom
  • includedInSitemap, exclusionReason
  • lastModified, crawledAt

SEO keywords

sitemap generator, xml sitemap generator, website sitemap crawler, generate sitemap.xml, seo sitemap tool, internal link crawler

Quick start

Store page: https://apify.com/scrappy_garden/sitemap-generator

Paste this into Input and click Run:

{
"startUrls": [
{
"url": "https://example.com/"
}
],
"proxyConfiguration": {
"useApifyProxy": false
}
}

Outputs (what you get)

  • Dataset: Dataset items typically include fields like: url, statusCode, includedInSitemap, exclusionReason, depth, lastModified, crawledAt.
  • Key-value store: SUMMARY, sitemap.xml

Tips (trust + predictable results)

  • Start with 1–3 URLs to validate behavior, then scale up.
  • If a target blocks requests, enable Proxy and/or slow down concurrency in Input.
  • Use the SUMMARY / REPORT keys (when present) for automation pipelines and monitoring.

Search keywords

sitemap generator, sitemap generator - crawl website & create xml sitemap, website audit, seo, sitemap