Sitemap Generator
Pricing
from $0.01 / 1,000 results
Sitemap Generator
Generate XML sitemaps by crawling any website. Discover all pages, images, & videos with configurable crawl depth, URL filters, & multiple output formats. Full Site Crawling ,Image Sitemap, Video Sitemap, Multiple Output Formats, URL Filtering, Configurable Depth, Last Modified, Webhook Integration
Pricing
from $0.01 / 1,000 results
Rating
0.0
(0)
Developer

The Howlers
Actor stats
0
Bookmarked
7
Total users
0
Monthly active users
8 days ago
Last modified
Categories
Share
Sitemap Generator - XML Sitemaps with Image, Video & Last Modified Detection
Generate XML sitemaps by crawling any website. Discover all pages, images, and videos with configurable crawl depth, URL filters, and multiple output formats. Automatically detects last modified dates and respects robots.txt. Essential for SEO audits, site migrations, and Google Search Console submissions.
Features
- Full Site Crawling - Automatically discover and index all pages
- Image Sitemap - Include images for Google Images indexing
- Video Sitemap - Include videos for Google Video indexing
- Last Modified Detection - Automatically detect lastmod from HTTP headers and meta tags
- Multiple Output Formats - XML, XML Index (for large sites), JSON, or plain text
- URL Filtering - Include or exclude URLs using regex patterns
- Configurable Depth - Control how deep the crawler follows links
- Robots.txt Compliance - Optionally respect robots.txt rules
- Change Frequency - Set default change frequency for sitemap entries
- Priority Settings - Configure default URL priority (0.0-1.0)
- Webhook Integration - Get notified when generation completes
- Demo Mode - Test with sample data before going live
Who Should Use This Actor?
SEO Agencies
Generate sitemaps for client websites as part of technical SEO audits. Submit to Google Search Console to improve crawl efficiency and indexing.
Web Developers
Create sitemaps during site launches and migrations. Get a complete URL inventory with metadata before redesigning or moving sites.
Content Teams
Build content inventories from existing websites. Know every page, image, and video on your site for content planning and optimization.
E-Commerce Teams
Generate product sitemaps with images for better Google Shopping and Google Images visibility. Include video sitemaps for product videos.
Technical SEO Consultants
Audit site structure and compare crawled URLs against existing sitemaps. Find orphaned pages not linked from the main navigation.
Site Migration Teams
Create complete URL inventories before migrations. Map all current URLs with metadata for redirect planning.
Quick Start
Demo Mode (Free Test)
{"demoMode": true}
Basic Sitemap Generation
{"startUrl": "https://example.com","maxPages": 500,"outputFormat": "xml","demoMode": false}
Image + Video Sitemap
{"startUrl": "https://example.com","maxPages": 1000,"includeImages": true,"includeVideos": true,"includeLastmod": true,"outputFormat": "xml","demoMode": false}
Filtered Sitemap (Blog Only)
{"startUrl": "https://example.com","maxPages": 500,"urlPatterns": ["/blog/"],"excludePatterns": ["/tag/", "/author/", "/page/"],"outputFormat": "xml","demoMode": false}
Input Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
startUrl | string | - | Website URL to crawl (required unless demoMode) |
maxPages | number | 1000 | Maximum pages to crawl |
maxDepth | number | 10 | Maximum link depth from start URL |
includeImages | boolean | true | Include images in sitemap |
includeVideos | boolean | false | Include videos in sitemap |
includeLastmod | boolean | true | Detect last modified dates |
respectRobotsTxt | boolean | true | Follow robots.txt rules |
urlPatterns | array | - | Only include URLs matching these regex patterns |
excludePatterns | array | - | Exclude URLs matching these regex patterns |
outputFormat | string | "xml" | Format: xml, xml-index, json, txt |
changefreq | string | "weekly" | Default change frequency |
priority | number | 0.5 | Default priority 0.0-1.0 |
demoMode | boolean | true | Return sample data for testing |
webhookUrl | string | - | Webhook URL for completion notification |
Output Format
{"url": "https://example.com","pagesFound": 245,"imagesFound": 892,"videosFound": 12,"crawlTime": 45230,"format": "xml","sitemap": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>...","pages": [{"url": "https://example.com/","lastmod": "2026-01-15","changefreq": "weekly","priority": 1.0,"images": ["https://example.com/logo.png"],"title": "Home Page","depth": 0},{"url": "https://example.com/products","lastmod": "2026-01-20","changefreq": "daily","priority": 0.8,"images": ["https://example.com/products/widget.jpg"],"title": "Products","depth": 1}]}
Pricing (Pay-Per-Event)
| Event | Description | Price |
|---|---|---|
page_crawled | Per page crawled | $0.001 |
Example costs:
- 100 pages: 100 x $0.001 = $0.10
- 500 pages: 500 x $0.001 = $0.50
- 1,000 pages: 1,000 x $0.001 = $1.00
- 5,000 pages: 5,000 x $0.001 = $5.00
- Demo mode: $0.00
Common Scenarios
Scenario 1: SEO Audit Sitemap
{"startUrl": "https://client-site.com","maxPages": 2000,"includeImages": true,"includeLastmod": true,"outputFormat": "xml","demoMode": false}
Generate a complete sitemap for submission to Google Search Console during an SEO audit.
Scenario 2: Pre-Migration URL Inventory
{"startUrl": "https://old-site.com","maxPages": 5000,"maxDepth": 15,"outputFormat": "json","demoMode": false}
Create a full URL inventory in JSON format for redirect mapping during site migration.
Scenario 3: Section-Specific Sitemap
{"startUrl": "https://example.com","maxPages": 1000,"urlPatterns": ["/products/"],"excludePatterns": ["/products/archived/"],"includeImages": true,"outputFormat": "xml","demoMode": false}
Generate a sitemap for only your product pages, excluding archived products.
Webhook & Automation Integration
Zapier / Make.com / n8n
- Create a webhook trigger in your automation platform
- Copy the webhook URL to
webhookUrl - Route results to storage, submission tools, or alerts
Popular automations:
- Sitemap XML -> Google Cloud Storage (auto-submit to GSC)
- Page list -> Google Sheets (content inventory)
- Crawl results -> Airtable (site structure database)
- Completion -> Slack alert (sitemap ready notification)
Apify Scheduled Runs
Schedule monthly sitemap regeneration to keep your sitemap current with site changes.
FAQ
Q: Does this create a file I can upload to Google Search Console?
A: Yes. The xml output format produces a standard XML sitemap file that can be submitted directly to Google Search Console.
Q: What is XML Index format?
A: For sites with more than 50,000 URLs, the xml-index format creates a sitemap index file that references multiple sitemap files. This follows Google's sitemap protocol for large sites.
Q: Does it respect robots.txt?
A: By default, yes (respectRobotsTxt: true). Set to false to crawl all URLs regardless of robots.txt directives.
Q: How is lastmod detected?
A: The crawler checks HTTP Last-Modified headers and HTML meta tags for modification dates. If neither is available, lastmod is omitted for that URL.
Q: Can I generate sitemaps for multiple sites?
A: Run the actor once per site. Each run generates a sitemap for a single domain starting from startUrl.
Common Problems & Solutions
"Crawl taking too long"
- Reduce
maxPagesfor faster completion - Reduce
maxDepthto limit crawl scope - Large sites with thousands of pages naturally take longer
"Missing pages in sitemap"
- Increase
maxPagesandmaxDepthlimits - Check that excluded pages aren't caught by
excludePatterns - Orphaned pages (no internal links) won't be discovered by crawling - add them manually
"Images not included"
- Set
includeImages: trueexplicitly - Some images loaded via JavaScript may not be detected
- Background images in CSS are not included
"Demo data showing"
- Set
demoMode: false- no API keys required
📞 Support
- Actor Arsenal: Full Actor Catalog
- Developer: John Rippy
Built by John Rippy | Actor Arsenal