Sitemap Generator avatar

Sitemap Generator

Pricing

from $0.01 / 1,000 results

Go to Apify Store
Sitemap Generator

Sitemap Generator

Generate XML sitemaps by crawling any website. Discover all pages, images, & videos with configurable crawl depth, URL filters, & multiple output formats. Full Site Crawling ,Image Sitemap, Video Sitemap, Multiple Output Formats, URL Filtering, Configurable Depth, Last Modified, Webhook Integration

Pricing

from $0.01 / 1,000 results

Rating

0.0

(0)

Developer

The Howlers

The Howlers

Maintained by Community

Actor stats

0

Bookmarked

7

Total users

0

Monthly active users

8 days ago

Last modified

Share

Sitemap Generator - XML Sitemaps with Image, Video & Last Modified Detection

Generate XML sitemaps by crawling any website. Discover all pages, images, and videos with configurable crawl depth, URL filters, and multiple output formats. Automatically detects last modified dates and respects robots.txt. Essential for SEO audits, site migrations, and Google Search Console submissions.

Features

  • Full Site Crawling - Automatically discover and index all pages
  • Image Sitemap - Include images for Google Images indexing
  • Video Sitemap - Include videos for Google Video indexing
  • Last Modified Detection - Automatically detect lastmod from HTTP headers and meta tags
  • Multiple Output Formats - XML, XML Index (for large sites), JSON, or plain text
  • URL Filtering - Include or exclude URLs using regex patterns
  • Configurable Depth - Control how deep the crawler follows links
  • Robots.txt Compliance - Optionally respect robots.txt rules
  • Change Frequency - Set default change frequency for sitemap entries
  • Priority Settings - Configure default URL priority (0.0-1.0)
  • Webhook Integration - Get notified when generation completes
  • Demo Mode - Test with sample data before going live

Who Should Use This Actor?

SEO Agencies

Generate sitemaps for client websites as part of technical SEO audits. Submit to Google Search Console to improve crawl efficiency and indexing.

Web Developers

Create sitemaps during site launches and migrations. Get a complete URL inventory with metadata before redesigning or moving sites.

Content Teams

Build content inventories from existing websites. Know every page, image, and video on your site for content planning and optimization.

E-Commerce Teams

Generate product sitemaps with images for better Google Shopping and Google Images visibility. Include video sitemaps for product videos.

Technical SEO Consultants

Audit site structure and compare crawled URLs against existing sitemaps. Find orphaned pages not linked from the main navigation.

Site Migration Teams

Create complete URL inventories before migrations. Map all current URLs with metadata for redirect planning.

Quick Start

Demo Mode (Free Test)

{
"demoMode": true
}

Basic Sitemap Generation

{
"startUrl": "https://example.com",
"maxPages": 500,
"outputFormat": "xml",
"demoMode": false
}

Image + Video Sitemap

{
"startUrl": "https://example.com",
"maxPages": 1000,
"includeImages": true,
"includeVideos": true,
"includeLastmod": true,
"outputFormat": "xml",
"demoMode": false
}

Filtered Sitemap (Blog Only)

{
"startUrl": "https://example.com",
"maxPages": 500,
"urlPatterns": ["/blog/"],
"excludePatterns": ["/tag/", "/author/", "/page/"],
"outputFormat": "xml",
"demoMode": false
}

Input Parameters

ParameterTypeDefaultDescription
startUrlstring-Website URL to crawl (required unless demoMode)
maxPagesnumber1000Maximum pages to crawl
maxDepthnumber10Maximum link depth from start URL
includeImagesbooleantrueInclude images in sitemap
includeVideosbooleanfalseInclude videos in sitemap
includeLastmodbooleantrueDetect last modified dates
respectRobotsTxtbooleantrueFollow robots.txt rules
urlPatternsarray-Only include URLs matching these regex patterns
excludePatternsarray-Exclude URLs matching these regex patterns
outputFormatstring"xml"Format: xml, xml-index, json, txt
changefreqstring"weekly"Default change frequency
prioritynumber0.5Default priority 0.0-1.0
demoModebooleantrueReturn sample data for testing
webhookUrlstring-Webhook URL for completion notification

Output Format

{
"url": "https://example.com",
"pagesFound": 245,
"imagesFound": 892,
"videosFound": 12,
"crawlTime": 45230,
"format": "xml",
"sitemap": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>...",
"pages": [
{
"url": "https://example.com/",
"lastmod": "2026-01-15",
"changefreq": "weekly",
"priority": 1.0,
"images": ["https://example.com/logo.png"],
"title": "Home Page",
"depth": 0
},
{
"url": "https://example.com/products",
"lastmod": "2026-01-20",
"changefreq": "daily",
"priority": 0.8,
"images": ["https://example.com/products/widget.jpg"],
"title": "Products",
"depth": 1
}
]
}

Pricing (Pay-Per-Event)

EventDescriptionPrice
page_crawledPer page crawled$0.001

Example costs:

  • 100 pages: 100 x $0.001 = $0.10
  • 500 pages: 500 x $0.001 = $0.50
  • 1,000 pages: 1,000 x $0.001 = $1.00
  • 5,000 pages: 5,000 x $0.001 = $5.00
  • Demo mode: $0.00

Common Scenarios

Scenario 1: SEO Audit Sitemap

{
"startUrl": "https://client-site.com",
"maxPages": 2000,
"includeImages": true,
"includeLastmod": true,
"outputFormat": "xml",
"demoMode": false
}

Generate a complete sitemap for submission to Google Search Console during an SEO audit.

Scenario 2: Pre-Migration URL Inventory

{
"startUrl": "https://old-site.com",
"maxPages": 5000,
"maxDepth": 15,
"outputFormat": "json",
"demoMode": false
}

Create a full URL inventory in JSON format for redirect mapping during site migration.

Scenario 3: Section-Specific Sitemap

{
"startUrl": "https://example.com",
"maxPages": 1000,
"urlPatterns": ["/products/"],
"excludePatterns": ["/products/archived/"],
"includeImages": true,
"outputFormat": "xml",
"demoMode": false
}

Generate a sitemap for only your product pages, excluding archived products.

Webhook & Automation Integration

Zapier / Make.com / n8n

  1. Create a webhook trigger in your automation platform
  2. Copy the webhook URL to webhookUrl
  3. Route results to storage, submission tools, or alerts

Popular automations:

  • Sitemap XML -> Google Cloud Storage (auto-submit to GSC)
  • Page list -> Google Sheets (content inventory)
  • Crawl results -> Airtable (site structure database)
  • Completion -> Slack alert (sitemap ready notification)

Apify Scheduled Runs

Schedule monthly sitemap regeneration to keep your sitemap current with site changes.

FAQ

Q: Does this create a file I can upload to Google Search Console?

A: Yes. The xml output format produces a standard XML sitemap file that can be submitted directly to Google Search Console.

Q: What is XML Index format?

A: For sites with more than 50,000 URLs, the xml-index format creates a sitemap index file that references multiple sitemap files. This follows Google's sitemap protocol for large sites.

Q: Does it respect robots.txt?

A: By default, yes (respectRobotsTxt: true). Set to false to crawl all URLs regardless of robots.txt directives.

Q: How is lastmod detected?

A: The crawler checks HTTP Last-Modified headers and HTML meta tags for modification dates. If neither is available, lastmod is omitted for that URL.

Q: Can I generate sitemaps for multiple sites?

A: Run the actor once per site. Each run generates a sitemap for a single domain starting from startUrl.

Common Problems & Solutions

"Crawl taking too long"

  • Reduce maxPages for faster completion
  • Reduce maxDepth to limit crawl scope
  • Large sites with thousands of pages naturally take longer

"Missing pages in sitemap"

  • Increase maxPages and maxDepth limits
  • Check that excluded pages aren't caught by excludePatterns
  • Orphaned pages (no internal links) won't be discovered by crawling - add them manually

"Images not included"

  • Set includeImages: true explicitly
  • Some images loaded via JavaScript may not be detected
  • Background images in CSS are not included

"Demo data showing"

  • Set demoMode: false - no API keys required

📞 Support


Built by John Rippy | Actor Arsenal