Sitemap Analyzer API | sitemap.xml SEO Audit
Pricing
from $4.00 / 1,000 results
Sitemap Analyzer API | sitemap.xml SEO Audit
Analyze sitemap.xml files for structure, freshness, broken URLs, and crawl-ready SEO insights at scale.
Pricing
from $4.00 / 1,000 results
Rating
0.0
(0)
Developer
太郎 山田
Actor stats
0
Bookmarked
3
Total users
1
Monthly active users
21 hours ago
Last modified
Categories
Share
Analyze sitemap.xml files for structure, freshness, broken URLs, and crawl-ready SEO insights at scale.
Store Quickstart
- Start with
store-input.example.jsonto analyze one public sitemap with a small URL cap. - If that matches your SEO workflow, switch to
store-input.templates.jsonand pick one of: Quickstart (Dataset)for a fast structural auditLarge Site Auditfor deeper coverage and status checksWebhook Alertfor change-driven monitoring
Key Features
- 🗺️ Sitemap index support — Handles nested sitemap structures
- 📊 Structure analysis — Top directories, depth distribution, file extensions
- 📅 Update pattern detection — lastmod distribution, changefreq analysis
- 🔗 Dead link checker — Optional HEAD request sampling to find broken URLs
- 🏗️ Architecture insights — Understand site structure from sitemap alone
- 📋 Bulk processing — Analyze multiple sitemaps per run
Use Cases
| Who | Why |
|---|---|
| SEO agencies | Technical SEO audits — sitemap completeness and structure |
| Content strategists | Identify content update patterns and stale pages |
| Web developers | Verify sitemap structure before search engine submission |
| Competitive analysts | Map competitor site architecture from public sitemaps |
Input
| Field | Type | Default | Description |
|---|---|---|---|
| sitemapUrls | array | prefilled | URLs of sitemap.xml files to analyze. Auto-discovers /sitemap.xml if you provide just a domain. |
| maxUrls | integer | 5000 | Maximum number of URLs to process from each sitemap. |
| checkStatus | boolean | false | Send HEAD requests to check if URLs return 200. Slower but finds dead links. |
| delivery | string | "dataset" | How to deliver results. 'dataset' saves to Apify Dataset (recommended), 'webhook' sends to a URL. |
| webhookUrl | string | — | Webhook URL to send results to (only used when delivery is 'webhook'). Works with Slack, Discord, or any HTTP endpoint. |
| concurrency | integer | 3 | Maximum number of parallel requests. Higher = faster but may trigger rate limits. |
| dryRun | boolean | false | If true, runs without saving results or sending webhooks. Useful for testing. |
Input Example
{"sitemapUrls": ["https://apify.com/sitemap.xml"],"maxUrls": 5000,"checkStatus": false,"concurrency": 3}
Output
| Field | Type | Description |
|---|---|---|
meta | object | |
results | array | |
results[].sitemapUrl | string (url) | |
results[].finalUrl | string (url) | |
results[].status | string | |
results[].analysis | object | |
results[].error | null | |
results[].checkedAt | timestamp |
Output Example
{"sitemapUrl": "https://apify.com/sitemap.xml","status": "ok","analysis": {"type": "urlset","totalUrls": 1247,"structure": {"topDirectories": [{ "path": "/store", "count": 890, "percentage": 71 },{ "path": "/blog", "count": 156, "percentage": 13 }],"depthDistribution": { "1": 45, "2": 890, "3": 312 }},"updateFrequency": {"lastModRange": { "oldest": "2023-01-15", "newest": "2026-02-20" },"urlsWithLastmod": 1100}}}
API Usage
Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console → Settings → Integrations.
cURL
curl -X POST "https://api.apify.com/v2/acts/taroyamada~sitemap-analyzer/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \-H "Content-Type: application/json" \-d '{ "sitemapUrls": ["https://apify.com/sitemap.xml"], "maxUrls": 5000, "checkStatus": false, "concurrency": 3 }'
Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("taroyamada/sitemap-analyzer").call(run_input={"sitemapUrls": ["https://apify.com/sitemap.xml"],"maxUrls": 5000,"checkStatus": false,"concurrency": 3})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item)
JavaScript / Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('taroyamada/sitemap-analyzer').call({"sitemapUrls": ["https://apify.com/sitemap.xml"],"maxUrls": 5000,"checkStatus": false,"concurrency": 3});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
Tips & Limitations
- Keep concurrency ≤ 5 when auditing production sites to avoid WAF rate-limit triggers.
- Use webhook delivery for recurring cron runs — push only deltas to downstream systems.
- Enable
dryRunfor cheap validation before committing to a paid cron schedule. - Results are dataset-first; use Apify API
run-sync-get-dataset-itemsfor instant JSON in CI pipelines. - Run a tiny URL count first, review the sample, then scale up — pay-per-event means you only pay for what you use.
FAQ
Is there a rate limit?
Built-in concurrency throttling keeps requests polite. For most public APIs this actor can run 1–10 parallel requests without issues.
What happens when the input URL is unreachable?
The actor records an error row with the failure reason — successful URLs keep processing.
Can I schedule recurring runs?
Yes — use Apify Schedules to run this actor on a cron (hourly, daily, weekly). Combine with webhook delivery for change alerts.
Does this actor respect robots.txt?
Yes — requests use a standard User-Agent and honor site rate limits. For aggressive audits, set a higher concurrency only on your own properties.
Can I integrate with Google Sheets or Airtable?
Use webhook delivery with a Zapier/Make/n8n catcher, or call the Apify REST API from Apps Script / Airtable automations.
Related Actors
URL/Link Tools cluster — explore related Apify tools:
- 🔗 URL Health Checker — Bulk-check HTTP status codes, redirects, SSL validity, and response times for thousands of URLs.
- 🔗 Broken Link Checker — Crawl websites to find broken links, 404 errors, and dead URLs.
- 🔗 URL Unshortener — Expand bit.
- 🏷️ Meta Tag Analyzer — Analyze meta tags, Open Graph, Twitter Cards, JSON-LD, and hreflang for any URL.
- 📚 Wayback Machine Checker — Check if URLs are archived on the Wayback Machine and find closest snapshots by date.
- Schema.org Validator API | JSON-LD + Microdata — Validate JSON-LD and Microdata across multiple pages, score markup quality, and flag missing or malformed Schema.
- Site Governance Monitor | Robots, Sitemap & Schema — Recurring robots.
- RDAP Domain Monitor API | Ownership + Expiry — Monitor domain registration data via RDAP and track expiry, registrar, nameserver, and ownership changes in structured rows.
- Domain Security Audit API | SSL Expiry, DMARC, Domain Expiry — Summary-first portfolio monitor for SSL expiry, DMARC/SPF/DKIM, domain expiry/ownership, and security headers with remediation-ready outputs.
Cost
Pay Per Event:
actor-start: $0.01 (flat fee per run)dataset-item: $0.003 per output item
Example: 1,000 items = $0.01 + (1,000 × $0.003) = $3.01
No subscription required — you only pay for what you use.