Sitemap Generator
Pricing
from $0.01 / actor start
Sitemap Generator
Crawl websites and generate XML sitemaps with configurable depth and page limits. Discover all pages, extract metadata, and output a ready-to-use sitemap.xml.
Pricing
from $0.01 / actor start
Rating
0.0
(0)
Developer
Monkey Coder
Actor stats
1
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Share
Generate XML sitemaps by crawling websites with configurable depth and page limits.
What it does
Sitemap Generator starts from one or more URLs, crawls internal links using breadth-first traversal, and produces:
- Per-page crawl metadata in the Apify dataset
- A complete XML sitemap string for each start URL
This actor is designed for SEO discovery, site inventory checks, and quick sitemap generation from live websites.
Features
- Crawls internal links only (same domain/subdomain family)
- Breadth-first traversal with
max_depthandmax_pages - Handles relative URLs, fragments, query strings, redirects, and timeouts
- Skips common non-HTML/static resources (images, CSS, JS, PDFs, archives, media)
- Extracts page title, approximate word count, link counts, and HTTP metadata
- Outputs XML sitemap in standard
urlsetformat
How to use
- Provide one or more Start URLs.
- Set Maximum Crawl Depth (default
3). - Set Maximum Pages per start URL (default
100). - Run the actor.
The actor writes one dataset item per discovered page with crawl metrics. For each start URL, the first dataset item includes sitemap_xml for all pages discovered in that crawl.
Input
start_urls(array, requestListSources editor)max_depth(integer, default3)max_pages(integer, default100)
Sample output JSON
{"url": "https://example.com/docs","depth": 1,"status_code": 200,"content_type": "text/html; charset=utf-8","title": "Documentation | Example","last_modified": "Tue, 12 Mar 2024 09:12:11 GMT","word_count": 842,"internal_links_count": 34,"external_links_count": 6,"sitemap_xml": null,"total_pages_found": 57,"crawl_started_at": "2026-03-18T12:34:56.000000+00:00"}
Example first item for a crawl includes sitemap_xml:
<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"><url><loc>https://example.com</loc><lastmod>2026-03-18</lastmod><priority>1.0</priority></url></urlset>
Notes about limits
max_depthandmax_pagesare safety limits; higher values increase run time and request volume.- Only HTTP/HTTPS pages are crawled.
- Some sites block crawlers or require JavaScript rendering; this actor performs pure HTTP crawling.
word_countis an approximation derived from visible page text.