Sitemap Generator
Pricing
$2.00 / 1,000 results
Sitemap Generator
A powerful Apify actor that generates XML sitemaps for websites. Perfect for SEO optimization and website indexing.
Pricing
$2.00 / 1,000 results
Rating
0.0
(0)
Developer

Salman Bareesh
Actor stats
0
Bookmarked
1
Total users
0
Monthly active users
2 days ago
Last modified
Categories
Share
Sitemap Generator Actor
A powerful Apify actor that generates XML sitemaps for websites. Perfect for SEO optimization and website indexing.
Features
- Automatic URL Discovery: Use the built-in web crawler to automatically discover all URLs on your website
- Manual URL Input: Provide a list of URLs directly for sitemap generation
- Standard Compliance: Generates XML sitemaps compliant with sitemaps.org protocol
- Configurable Metadata: Set change frequency and priority for each URL
- Flexible Output: Output as JSON or save as sitemap.xml file
- Domain Validation: Automatically filters URLs to stay within the specified domain
How It Works
The actor can operate in two modes:
Mode 1: Web Crawler (Automatic URL Discovery)
When useWebCrawler is enabled, the actor uses Apify's Web Crawler to discover all links on your website up to a specified depth. This is ideal for discovering all pages automatically.
Mode 2: Direct URL Input
Provide a list of URLs directly, and the actor will include them in the sitemap. This is useful when you already have a known list of URLs.
Input Configuration
Required Parameters
base_url(string): The root URL of your website (e.g.,https://example.com)
Optional Parameters
-
urls(array): Array of specific URLs to include in the sitemap- Default:
[] - Example:
["https://example.com/page1", "https://example.com/page2"]
- Default:
-
useWebCrawler(boolean): Enable automatic URL discovery- Default:
false - When
true, the actor crawls your website to find all links
- Default:
-
maxCrawlDepth(integer): Maximum depth for the web crawler- Default:
2 - Valid range: 0-10
- Only applies when
useWebCrawleristrue
- Default:
-
maxPages(integer): Maximum number of pages to include in the sitemap- Default:
1000 - Valid range: 1-50000
- Default:
-
changeFrequency(string): Default change frequency for all URLs- Default:
weekly - Valid options:
always,hourly,daily,weekly,monthly,yearly,never
- Default:
-
priority(number): Default priority for all URLs- Default:
0.8 - Valid range: 0.0-1.0
- Indicates importance relative to other URLs on your site
- Default:
-
saveToStorage(boolean): Save the sitemap as a file to Apify storage- Default:
false - When
true, output includes/tmp/sitemap.xml
- Default:
Example Inputs
Example 1: Simple Web Crawl
{"base_url": "https://example.com","useWebCrawler": true,"maxCrawlDepth": 2,"maxPages": 500}
Example 2: Direct URL List
{"base_url": "https://example.com","urls": ["https://example.com/","https://example.com/about","https://example.com/services","https://example.com/contact"],"changeFrequency": "monthly","priority": 0.9,"saveToStorage": true}
Example 3: Custom Configuration
{"base_url": "https://blog.example.com","useWebCrawler": true,"maxCrawlDepth": 3,"maxPages": 2000,"changeFrequency": "daily","priority": 0.7,"saveToStorage": true}
Output
The actor outputs a dataset item with the following structure:
{"sitemap": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<urlset xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">...</urlset>","total_urls": 150,"base_url": "https://example.com","generated_at": "2024-01-15T10:30:45.123456"}
Sample Sitemap XML
<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"><url><loc>https://example.com/</loc><lastmod>2024-01-15</lastmod><changefreq>weekly</changefreq><priority>0.8</priority></url><url><loc>https://example.com/about</loc><lastmod>2024-01-15</lastmod><changefreq>weekly</changefreq><priority>0.8</priority></url></urlset>
Use Cases
- SEO Optimization: Submit sitemaps to Google Search Console and Bing Webmaster Tools
- Website Indexing: Help search engines discover all pages on your website
- Site Structure Analysis: Understand your website's URL structure
- Link Validation: Identify all crawlable pages before migration
- Multi-language Sites: Generate sitemaps for international websites
Technical Details
- Language: Python 3.11+
- Runtime: Apify Actor
- Key Dependencies:
apify: Apify SDK for actor developmentapify-client: Client for Apify APIbeautifulsoup4: HTML parsing for link extractionrequests: HTTP requestslxml: XML processing
Error Handling
The actor handles various error scenarios:
- Invalid URLs are automatically filtered
- Cross-domain URLs are excluded
- Malformed URLs are skipped with logging
- Missing base_url parameter raises a clear error message
Performance Considerations
- Large Sites: For sites with 10,000+ pages, consider increasing
maxPagesand using pagination - Crawl Depth: Each depth level increases crawl time exponentially (use 2-3 for most sites)
- API Limits: Apify actor runs are subject to platform resource limits
Troubleshooting
No URLs Found
- Verify the
base_urlis correct and accessible - Check that
useWebCrawleris enabled if expecting automatic discovery - Ensure the website doesn't block crawlers with robots.txt
Too Few URLs
- Increase
maxCrawlDepthto discover deeper pages - Verify pages are linked and not isolated
- Check for JavaScript-rendered content (may need different crawler)
Sitemap File Not Created
- Ensure
saveToStorageis set totrue - Check actor logs for file write errors
- Verify sufficient storage quota available
License
This actor is provided under the MIT License. Feel free to modify and distribute as needed.
Support
For issues, questions, or feature requests, please contact the development team or open an issue on the repository.