Sitemap Generator avatar

Sitemap Generator

Pricing

from $0.01 / 1,000 results

Go to Apify Store
Sitemap Generator

Sitemap Generator

Generate XML sitemaps by crawling any website. Discover all pages, images, & videos with configurable crawl depth, URL filters, & multiple output formats. Full Site Crawling ,Image Sitemap, Video Sitemap, Multiple Output Formats, URL Filtering, Configurable Depth, Last Modified, Webhook Integration

Pricing

from $0.01 / 1,000 results

Rating

0.0

(0)

Developer

John Rippy

John Rippy

Maintained by Community

Actor stats

0

Bookmarked

6

Total users

2

Monthly active users

7 days ago

Last modified

Share

Generate XML sitemaps by crawling any website. Discover all pages, images, and videos with configurable crawl depth, URL filters, and output formats. Built by John Rippy (https://www.linkedin.com/in/johnrippy/ | https://johnrippy.link/).

Features

  • Full Site Crawling: Automatically discovers and indexes all pages on your website
  • Image Sitemap: Include images in your sitemap for Google Images indexing
  • Video Sitemap: Include videos for Google Video indexing
  • Multiple Output Formats: XML, XML Index (for large sites), JSON, or plain text
  • URL Filtering: Include or exclude URLs using regex patterns
  • Configurable Depth: Control how deep the crawler goes
  • Last Modified Detection: Automatically detects lastmod from headers/meta tags
  • Webhook Integration: Get notified when sitemap generation completes

Quick Start

{
"input": "your input here"
}

Demo Mode

Set demoMode: true to test with sample data (no charges). When you're ready for real results, set demoMode: false or omit it.

{
"demoMode": true,
...
}

Input Parameters

ParameterTypeDescription
startUrlstringThe website URL to crawl (required unless demoMode)
maxPagesintegerMaximum pages to crawl (default: 1000)
maxDepthintegerMaximum link depth from start URL (default: 10)
includeImagesbooleanInclude images in sitemap (default: true)
includeVideosbooleanInclude videos in sitemap (default: false)
includeLastmodbooleanDetect last modified dates (default: true)
respectRobotsTxtbooleanFollow robots.txt rules (default: true)
urlPatternsarrayOnly include URLs matching these regex patterns
excludePatternsarrayExclude URLs matching these regex patterns
outputFormatstringxml, xml-index, json, or txt (default: xml)
changefreqstringDefault change frequency (default: weekly)
prioritynumberDefault priority 0.0-1.0 (default: 0.5)
webhookUrlstringURL to receive completion notification
demoModebooleanReturn sample data for testing

Output Format

The actor outputs a single result object containing:

{
"url": "https://example.com",
"pagesFound": 245,
"imagesFound": 892,
"videosFound": 12,
"crawlTime": 45230,
"format": "xml",
"sitemap": "<?xml version=\"1.0\"...>",
"pages": [
{
"url": "https://example.com/",
"lastmod": "2024-01-15",
"changefreq": "weekly",
"priority": 1.0,
"images": ["https://example.com/logo.png"],
"title": "Home Page",
"depth": 0
}
]
}

Pricing

This actor uses pay-per-event billing: This actor uses the pay-per-event pricing model:

  • Base cost: $0.10 per run
  • Per page: $0.001 per page crawled

Example: Crawling 1,000 pages costs approximately $1.10

Use Cases

  • SEO Audits: Generate sitemaps to submit to Google Search Console
  • Site Migration: Create a complete URL inventory before redesigning
  • Content Inventory: Get a full list of all pages and their metadata
  • Image SEO: Generate image sitemaps for better Google Images visibility
  • Monitoring: Track how your site structure changes over time

Common Problems & Solutions

"Invalid API key" error

Cause: Your API key is wrong, expired, or doesn't have the right permissions. Fix: Double-check your API key. Make sure you copied it exactly without extra spaces.

"Rate limit exceeded" error

Cause: You've hit the API's rate limits. Fix: Wait a few minutes, then try again. Consider reducing the number of concurrent requests.

Empty or incomplete results

Cause: The target may have anti-scraping protection or the data doesn't exist. Fix:

  • Check if the URL/search query is correct
  • Try with different parameters
  • Some sites may block automated access

Demo data showing instead of real results

Cause: demoMode is still set to true. Fix: Set demoMode: false and provide your API key(s).


Built by John Rippy | Actor Arsenal