Pricing

from $0.01 / actor start

Sitemap Generator

Crawl websites and generate XML sitemaps with configurable depth and page limits. Discover all pages, extract metadata, and output a ready-to-use sitemap.xml.

Pricing

from $0.01 / actor start

Rating

0.0

(0)

Developer

Monkey Coder

Actor stats

Bookmarked

Total users

Monthly active users

4 months ago

Last modified

What it does

Sitemap Generator starts from one or more URLs, crawls internal links using breadth-first traversal, and produces:

Per-page crawl metadata in the Apify dataset
A complete XML sitemap string for each start URL

This actor is designed for SEO discovery, site inventory checks, and quick sitemap generation from live websites.

Features

Crawls internal links only (same domain/subdomain family)
Breadth-first traversal with max_depth and max_pages
Handles relative URLs, fragments, query strings, redirects, and timeouts
Skips common non-HTML/static resources (images, CSS, JS, PDFs, archives, media)
Extracts page title, approximate word count, link counts, and HTTP metadata
Outputs XML sitemap in standard urlset format

How to use

Provide one or more Start URLs.
Set Maximum Crawl Depth (default 3).
Set Maximum Pages per start URL (default 100).
Run the actor.

The actor writes one dataset item per discovered page with crawl metrics. For each start URL, the first dataset item includes sitemap_xml for all pages discovered in that crawl.

Input

start_urls (array, requestListSources editor)
max_depth (integer, default 3)
max_pages (integer, default 100)

Sample output JSON

{
  "url": "https://example.com/docs",
  "depth": 1,
  "status_code": 200,
  "content_type": "text/html; charset=utf-8",
  "title": "Documentation | Example",
  "last_modified": "Tue, 12 Mar 2024 09:12:11 GMT",
  "word_count": 842,
  "internal_links_count": 34,
  "external_links_count": 6,
  "sitemap_xml": null,
  "total_pages_found": 57,
  "crawl_started_at": "2026-03-18T12:34:56.000000+00:00"
}

Example first item for a crawl includes sitemap_xml:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com</loc>
    <lastmod>2026-03-18</lastmod>
    <priority>1.0</priority>
  </url>
</urlset>

Notes about limits

max_depth and max_pages are safety limits; higher values increase run time and request volume.
Only HTTP/HTTPS pages are crawled.
Some sites block crawlers or require JavaScript rendering; this actor performs pure HTTP crawling.
word_count is an approximation derived from visible page text.

Sitemap Generator - Crawl Website & Create XML Sitemap

scrappy_garden/sitemap-generator

Generate an XML sitemap for any website. Crawls internal pages from start URLs (with depth + page limits), deduplicates URLs, and stores a ready-to-submit sitemap.xml plus a structured dataset and summary for SEO audits.

Bikram Adhikari

Sitemap Generator

alizarin_refrigerator-owner/sitemap-generator

Generate XML sitemaps by crawling any website. Discover all pages, images, & videos with configurable crawl depth, URL filters, & multiple output formats. Full Site Crawling ,Image Sitemap, Video Sitemap, Multiple Output Formats, URL Filtering, Configurable Depth, Last Modified, Webhook Integration

The Howlers

Sitemap Generator — XML Sitemap Crawler

junipr/sitemap-generator

Crawl a website, discover indexable URLs, and generate XML sitemaps with crawl diagnostics, canonical checks, status codes, and exportable URL inventory.

junipr

Sitemap Generator

datawinder/sitemap-generator

Automatically crawl a website and generate an SEO-ready sitemap in XML, HTML, or TXT format. Supports crawl depth limits, URL include/exclude patterns, and optional merging with an existing sitemap.xml. Ideal for SEO audits, site migrations, and automation.

DatawinderLabs

Sitemap URL Extractor - XML Sitemap Scraper

benthepythondev/sitemap-url-extractor

Extract URLs from XML sitemaps and sitemap indexes. Get URL, lastmod, changefreq, priority and source sitemap.

Ben

Sitemap URL Extractor - List All URLs in a Sitemap

dltik/sitemap-url-extractor

Extract every URL from any XML sitemap, with lastmod, changefreq and priority. Resolves sitemap indexes recursively. Pass a sitemap.xml or just a site root to auto-discover its sitemaps. Pure HTTP, no browser — fast and cheap.

Walid

Sitemap Finder & URL Extractor · Crawl Any XML Sitemap

corent1robert/sitemap-detector

Find and crawl XML sitemaps from any website. Follows sitemap indexes, handles gzip, and exports every page URL with source file and lastmod into a clean dataset. No config needed.

Corentin Robert

Sitemap Generator - Creates sitemap.xml for any domain

wisteria_banjo/sitemap-generator---creates-sitemap-xml-for-any-domain

Generate a clean, standards-compliant sitemap.xml for a website. This actor crawls a single website, discovers all indexable pages, and produces: ✅ A ready-to-submit sitemap.xml (Google-compliant) ✅ A structured JSON dataset of discovered URLs (for auditing, reporting, and billing)

Chris Xavier

Sitemap to URL Crawler — Extract Sitemap.xml URLs

logiover/sitemap-to-url-crawler

Extract all URLs from any sitemap.xml recursively. Export sitemap URLs to CSV/JSON for RAG pipelines, SEO audits, and LLM training datasets.

Logiover

Sitemap Extractor — Expand XML Sitemaps to URL Lists

omao/sitemap-extractor

Fetch any XML sitemap (or a domain) and get a clean list of all its URLs with lastmod, changefreq and priority. Follows sitemap indexes and gzipped sitemaps. Great for SEO and crawling.