Sitemap Generator avatar
Sitemap Generator
Under maintenance

Pricing

$2.00 / 1,000 results

Go to Apify Store
Sitemap Generator

Sitemap Generator

Under maintenance

A powerful Apify actor that generates XML sitemaps for websites. Perfect for SEO optimization and website indexing.

Pricing

$2.00 / 1,000 results

Rating

0.0

(0)

Developer

Salman Bareesh

Salman Bareesh

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

2 days ago

Last modified

Categories

Share

Sitemap Generator Actor

A powerful Apify actor that generates XML sitemaps for websites. Perfect for SEO optimization and website indexing.

Features

  • Automatic URL Discovery: Use the built-in web crawler to automatically discover all URLs on your website
  • Manual URL Input: Provide a list of URLs directly for sitemap generation
  • Standard Compliance: Generates XML sitemaps compliant with sitemaps.org protocol
  • Configurable Metadata: Set change frequency and priority for each URL
  • Flexible Output: Output as JSON or save as sitemap.xml file
  • Domain Validation: Automatically filters URLs to stay within the specified domain

How It Works

The actor can operate in two modes:

Mode 1: Web Crawler (Automatic URL Discovery)

When useWebCrawler is enabled, the actor uses Apify's Web Crawler to discover all links on your website up to a specified depth. This is ideal for discovering all pages automatically.

Mode 2: Direct URL Input

Provide a list of URLs directly, and the actor will include them in the sitemap. This is useful when you already have a known list of URLs.

Input Configuration

Required Parameters

  • base_url (string): The root URL of your website (e.g., https://example.com)

Optional Parameters

  • urls (array): Array of specific URLs to include in the sitemap

    • Default: []
    • Example: ["https://example.com/page1", "https://example.com/page2"]
  • useWebCrawler (boolean): Enable automatic URL discovery

    • Default: false
    • When true, the actor crawls your website to find all links
  • maxCrawlDepth (integer): Maximum depth for the web crawler

    • Default: 2
    • Valid range: 0-10
    • Only applies when useWebCrawler is true
  • maxPages (integer): Maximum number of pages to include in the sitemap

    • Default: 1000
    • Valid range: 1-50000
  • changeFrequency (string): Default change frequency for all URLs

    • Default: weekly
    • Valid options: always, hourly, daily, weekly, monthly, yearly, never
  • priority (number): Default priority for all URLs

    • Default: 0.8
    • Valid range: 0.0-1.0
    • Indicates importance relative to other URLs on your site
  • saveToStorage (boolean): Save the sitemap as a file to Apify storage

    • Default: false
    • When true, output includes /tmp/sitemap.xml

Example Inputs

Example 1: Simple Web Crawl

{
"base_url": "https://example.com",
"useWebCrawler": true,
"maxCrawlDepth": 2,
"maxPages": 500
}

Example 2: Direct URL List

{
"base_url": "https://example.com",
"urls": [
"https://example.com/",
"https://example.com/about",
"https://example.com/services",
"https://example.com/contact"
],
"changeFrequency": "monthly",
"priority": 0.9,
"saveToStorage": true
}

Example 3: Custom Configuration

{
"base_url": "https://blog.example.com",
"useWebCrawler": true,
"maxCrawlDepth": 3,
"maxPages": 2000,
"changeFrequency": "daily",
"priority": 0.7,
"saveToStorage": true
}

Output

The actor outputs a dataset item with the following structure:

{
"sitemap": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<urlset xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">...</urlset>",
"total_urls": 150,
"base_url": "https://example.com",
"generated_at": "2024-01-15T10:30:45.123456"
}

Sample Sitemap XML

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2024-01-15</lastmod>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>https://example.com/about</loc>
<lastmod>2024-01-15</lastmod>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
</url>
</urlset>

Use Cases

  • SEO Optimization: Submit sitemaps to Google Search Console and Bing Webmaster Tools
  • Website Indexing: Help search engines discover all pages on your website
  • Site Structure Analysis: Understand your website's URL structure
  • Link Validation: Identify all crawlable pages before migration
  • Multi-language Sites: Generate sitemaps for international websites

Technical Details

  • Language: Python 3.11+
  • Runtime: Apify Actor
  • Key Dependencies:
    • apify: Apify SDK for actor development
    • apify-client: Client for Apify API
    • beautifulsoup4: HTML parsing for link extraction
    • requests: HTTP requests
    • lxml: XML processing

Error Handling

The actor handles various error scenarios:

  • Invalid URLs are automatically filtered
  • Cross-domain URLs are excluded
  • Malformed URLs are skipped with logging
  • Missing base_url parameter raises a clear error message

Performance Considerations

  • Large Sites: For sites with 10,000+ pages, consider increasing maxPages and using pagination
  • Crawl Depth: Each depth level increases crawl time exponentially (use 2-3 for most sites)
  • API Limits: Apify actor runs are subject to platform resource limits

Troubleshooting

No URLs Found

  • Verify the base_url is correct and accessible
  • Check that useWebCrawler is enabled if expecting automatic discovery
  • Ensure the website doesn't block crawlers with robots.txt

Too Few URLs

  • Increase maxCrawlDepth to discover deeper pages
  • Verify pages are linked and not isolated
  • Check for JavaScript-rendered content (may need different crawler)

Sitemap File Not Created

  • Ensure saveToStorage is set to true
  • Check actor logs for file write errors
  • Verify sufficient storage quota available

License

This actor is provided under the MIT License. Feel free to modify and distribute as needed.

Support

For issues, questions, or feature requests, please contact the development team or open an issue on the repository.