Sitemap Generator avatar

Sitemap Generator

Pricing

from $0.01 / actor start

Go to Apify Store
Sitemap Generator

Sitemap Generator

Crawl websites and generate XML sitemaps with configurable depth and page limits. Discover all pages, extract metadata, and output a ready-to-use sitemap.xml.

Pricing

from $0.01 / actor start

Rating

0.0

(0)

Developer

Monkey Coder

Monkey Coder

Maintained by Community

Actor stats

1

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Categories

Share

Generate XML sitemaps by crawling websites with configurable depth and page limits.

What it does

Sitemap Generator starts from one or more URLs, crawls internal links using breadth-first traversal, and produces:

  • Per-page crawl metadata in the Apify dataset
  • A complete XML sitemap string for each start URL

This actor is designed for SEO discovery, site inventory checks, and quick sitemap generation from live websites.

Features

  • Crawls internal links only (same domain/subdomain family)
  • Breadth-first traversal with max_depth and max_pages
  • Handles relative URLs, fragments, query strings, redirects, and timeouts
  • Skips common non-HTML/static resources (images, CSS, JS, PDFs, archives, media)
  • Extracts page title, approximate word count, link counts, and HTTP metadata
  • Outputs XML sitemap in standard urlset format

How to use

  1. Provide one or more Start URLs.
  2. Set Maximum Crawl Depth (default 3).
  3. Set Maximum Pages per start URL (default 100).
  4. Run the actor.

The actor writes one dataset item per discovered page with crawl metrics. For each start URL, the first dataset item includes sitemap_xml for all pages discovered in that crawl.

Input

  • start_urls (array, requestListSources editor)
  • max_depth (integer, default 3)
  • max_pages (integer, default 100)

Sample output JSON

{
"url": "https://example.com/docs",
"depth": 1,
"status_code": 200,
"content_type": "text/html; charset=utf-8",
"title": "Documentation | Example",
"last_modified": "Tue, 12 Mar 2024 09:12:11 GMT",
"word_count": 842,
"internal_links_count": 34,
"external_links_count": 6,
"sitemap_xml": null,
"total_pages_found": 57,
"crawl_started_at": "2026-03-18T12:34:56.000000+00:00"
}

Example first item for a crawl includes sitemap_xml:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com</loc>
<lastmod>2026-03-18</lastmod>
<priority>1.0</priority>
</url>
</urlset>

Notes about limits

  • max_depth and max_pages are safety limits; higher values increase run time and request volume.
  • Only HTTP/HTTPS pages are crawled.
  • Some sites block crawlers or require JavaScript rendering; this actor performs pure HTTP crawling.
  • word_count is an approximation derived from visible page text.