Sitemap Generator
Pricing
Pay per event
Sitemap Generator
Crawl any website and automatically generate a standards-compliant XML sitemap. Discovers all internal pages, extracts last-modified dates and page titles, and lets you download sitemap.xml directly. Configurable crawl depth and URL filters. Ideal for SEO audits and site migrations.
Pricing
Pay per event
Rating
0.0
(0)
Developer
Stas Persiianenko
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
Generate a complete XML sitemap for any website in minutes — no technical setup required.
Automatically crawl every page of a website and produce a standards-compliant XML sitemap ready for Google Search Console, Bing Webmaster Tools, and any other SEO platform. Each URL is also saved to a structured dataset for further analysis or integration.
What Does It Do?
Sitemap Generator crawls your website starting from the URL you provide. It follows every internal link it discovers, collects page metadata (title, content type, last-modified date), and compiles all of this into a well-formed XML sitemap that follows the Sitemaps protocol.
Key capabilities:
- Crawls up to 10,000 pages per run
- Generates XML sitemap saved directly to the key-value store for download
- Optionally includes image URLs as
<image:image>elements for Google Image Search - Extracts page titles and last-modified dates from HTTP headers
- Tracks crawl depth per URL
- Stays strictly within the same hostname (never follows external links)
- Normalizes URLs to avoid duplicates
- Skips non-page assets: PDFs, images, CSS, JS, zip files, and more
Who Is It For?
SEO professionals who need a fresh sitemap to submit to search engines after a site rebuild or migration.
Developers who want to generate a sitemap for a new site quickly without installing any software.
Content managers who want an inventory of all pages on their site for auditing purposes.
Digital agencies managing multiple client websites who need to automate sitemap generation at scale.
Why Use This Actor?
- Zero setup — no software to install, no command line required
- Fast — uses lightweight HTTP crawling (no browser), processes up to 10 parallel requests
- Standards-compliant — produces valid XML following the Sitemaps 0.9 protocol
- Downloadable output — sitemap.xml is saved to the key-value store for direct download
- Structured data — every URL also saved to a dataset for analysis in spreadsheets or databases
- Pay per use — charged per page crawled, not a flat fee. Small sites cost fractions of a cent.
- Image sitemaps — optional Google Image Sitemap extension support
Data Fields
Each URL found during the crawl produces one record in the dataset:
| Field | Type | Description |
|---|---|---|
url | string | The full, normalized URL of the page |
title | string / null | The HTML <title> of the page |
statusCode | number | HTTP status code returned (e.g., 200) |
depth | number | Crawl depth from the start URL (0 = start page) |
changefreq | string | The change frequency set for this URL |
priority | number | The priority value (start URL always 1.0, others configurable) |
lastmod | string / null | Date in YYYY-MM-DD format from Last-Modified header or today's date |
contentType | string / null | MIME type of the page (e.g., text/html) |
images | string[] | List of image URLs found on the page (only if includeImages is enabled) |
Pricing
This actor uses Pay Per Event billing:
| Event | Cost |
|---|---|
| Run started | $0.01 (one-time per run) |
| Per page crawled | $0.001 |
Example costs:
- 50-page site: $0.01 + 50 × $0.001 = $0.06
- 500-page site: $0.01 + 500 × $0.001 = $0.51
- 5,000-page site: $0.01 + 5,000 × $0.001 = $5.01
How to Use
Step 1: Enter your website URL
In the Website URL field, enter the full URL of the site you want to crawl, including the protocol:
https://example.com
Step 2: Set the page limit
The Max pages to crawl field controls how many pages the actor visits. The default is 100. For large sites, increase this up to 10,000.
Step 3: Configure sitemap options
- Include image URLs — Enable this if you want Google Image Search indexing support. Each page's images will be listed in the sitemap.
- Change frequency — Choose how often your pages typically change. This hint helps search engines schedule re-crawling.
- URL priority — Set the relative importance of pages (0.0–1.0). The start URL is always set to 1.0 automatically.
Step 4: Run the actor
Click Start. The actor will begin crawling immediately. You can watch progress in the logs.
Step 5: Download your sitemap
When the run completes:
- Click Storage in the run details
- Go to Key-value store
- Find
sitemap.xmland click to download
You can also access it directly via the Apify API.
Input Configuration
{"startUrl": "https://example.com","maxPages": 100,"includeImages": false,"changefreq": "weekly","priority": 0.5}
Input fields
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
startUrl | string | Yes | — | Full website URL to crawl |
maxPages | integer | No | 100 | Maximum pages to crawl (1–10,000) |
includeImages | boolean | No | false | Include image URLs in sitemap |
changefreq | string | No | weekly | Change frequency for all URLs |
priority | number | No | 0.5 | Default URL priority (0–1) |
changefreq options: always, hourly, daily, weekly, monthly, yearly, never
Output
Key-value store: sitemap.xml
A complete XML sitemap following the Sitemaps 0.9 protocol:
<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"><url><loc>https://example.com/</loc><lastmod>2024-01-15</lastmod><changefreq>weekly</changefreq><priority>1</priority></url><url><loc>https://example.com/about</loc><lastmod>2024-01-15</lastmod><changefreq>weekly</changefreq><priority>0.5</priority></url></urlset>
Dataset
Every URL is also stored as a structured record in the dataset. You can export this to JSON, CSV, or Excel directly from the Apify platform.
Tips and Best Practices
Start with a small limit first. Run with maxPages: 20 to verify the crawler is finding the right pages before committing to a full crawl.
Enable image sitemaps for e-commerce and photography sites. If you sell products with images or run a portfolio site, enabling includeImages can boost image search visibility.
Set changefreq based on your publishing cadence. A news site updates daily; a brochure site might be monthly. Accurate hints help search engines prioritize.
Submit your sitemap to Search Console. After downloading sitemap.xml, go to Google Search Console → Sitemaps and paste your sitemap URL, or upload the file.
For very large sites, increase timeout. If your site has thousands of pages, the default 600-second timeout should be sufficient. The actor processes pages in parallel (10 concurrent requests).
The start URL always gets priority 1.0. All other pages use the priority value you set. You do not need to manually set the homepage priority.
Robots.txt is respected by default. The underlying CheerioCrawler respects robots.txt directives, so the actor will not crawl pages blocked by robots.txt.
Integrations
Zapier / Make
Use the Apify integration to trigger sitemap generation on a schedule (weekly, monthly) and automatically download or email the resulting sitemap.xml file.
Google Search Console API
Combine this actor with the Search Console API to automatically submit your freshly generated sitemap after every crawl.
Apify Scheduler
Create a scheduled run in Apify to regenerate your sitemap automatically — for example, every Monday morning.
API Usage
Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('automation-lab/sitemap-generator').call({startUrl: 'https://example.com',maxPages: 100,includeImages: false,changefreq: 'weekly',priority: 0.5,});console.log('Run finished:', run.status);console.log('Dataset ID:', run.defaultDatasetId);// Download the sitemapconst store = client.keyValueStore(run.defaultKeyValueStoreId);const sitemap = await store.getRecord('sitemap.xml');console.log(sitemap.value);
Python
from apify_client import ApifyClientclient = ApifyClient('YOUR_API_TOKEN')run = client.actor('automation-lab/sitemap-generator').call(run_input={'startUrl': 'https://example.com','maxPages': 100,'includeImages': False,'changefreq': 'weekly','priority': 0.5,})print(f"Run finished: {run['status']}")print(f"Dataset ID: {run['defaultDatasetId']}")# Download the sitemapstore = client.key_value_store(run['defaultKeyValueStoreId'])record = store.get_record('sitemap.xml')print(record['value'])
cURL
# Start a runcurl -X POST \"https://api.apify.com/v2/acts/automation-lab~sitemap-generator/runs?token=YOUR_API_TOKEN" \-H "Content-Type: application/json" \-d '{"startUrl": "https://example.com","maxPages": 100,"includeImages": false,"changefreq": "weekly","priority": 0.5}'# Get run status (replace RUN_ID)curl "https://api.apify.com/v2/actor-runs/RUN_ID?token=YOUR_API_TOKEN"# Download sitemap.xml (replace STORE_ID)curl "https://api.apify.com/v2/key-value-stores/STORE_ID/records/sitemap.xml?token=YOUR_API_TOKEN"
Use with Claude AI (MCP)
This actor is available as a tool in Claude AI through the Model Context Protocol (MCP). Add it to Claude Desktop, Cursor, Windsurf, or any MCP-compatible client.
Setup for Claude Code
$claude mcp add --transport http apify "https://mcp.apify.com"
Setup for Claude Desktop, Cursor, or VS Code
Add this to your MCP config file:
{"mcpServers": {"apify": {"url": "https://mcp.apify.com"}}}
Example prompts
- "Generate an XML sitemap for example.com crawling up to 200 pages and give me the download link."
- "Create a sitemap for my e-commerce site at shop.example.com with image URLs included for Google Image Search."
- "Crawl https://docs.example.com and generate a sitemap with weekly change frequency for all pages found."
Learn more in the Apify MCP documentation.
Legality
Web crawling for sitemap generation is generally considered legitimate use and is commonly practiced by SEO tools, search engines, and webmasters. The actor:
- Respects
robots.txtdirectives by default - Crawls only the site you specify (stays on-domain)
- Uses standard HTTP requests without bypassing login walls or paywalls
- Does not store any personal data
Always ensure you have permission to crawl a website you do not own. Some websites' terms of service restrict automated crawling.
FAQ
Q: Why doesn't the actor find all pages on my site?
A: The actor discovers pages by following links. If your site uses JavaScript-rendered navigation (single-page apps), links may not appear in the HTML. Consider using a JavaScript-enabled crawling approach for SPAs.
Q: Can I crawl multiple websites in one run?
A: No, each run is scoped to one starting URL and hostname. Run the actor multiple times for multiple sites.
Q: The sitemap only has a few URLs — my site has many more.
A: Increase maxPages and run again. The actor stops when it reaches the page limit, not when the site is fully crawled.
Q: Does it follow redirects?
A: Yes. The actor follows HTTP redirects automatically and records the final URL in the sitemap.
Q: Can I set different priority values per page?
A: Not currently — the actor applies a uniform priority to all pages except the start URL (always 1.0). Priority per-page is a planned future feature.
Q: Is the XML output valid?
A: Yes. The output follows the Sitemaps 0.9 protocol and includes the correct XML namespace declarations.
Related Tools
- Broken Link Checker — Find 404 errors and broken links across your site
- Canonical URL Checker — Audit canonical tags and detect duplicate content issues
- Robots & Sitemap Analyzer — Analyze your robots.txt and existing sitemap configuration
- Accessibility Checker — Check pages for WCAG accessibility compliance