Sitemap Analyzer API | sitemap.xml SEO Audit avatar

Sitemap Analyzer API | sitemap.xml SEO Audit

Pricing

from $4.00 / 1,000 results

Go to Apify Store
Sitemap Analyzer API | sitemap.xml SEO Audit

Sitemap Analyzer API | sitemap.xml SEO Audit

Analyze sitemap.xml files for structure, freshness, broken URLs, and crawl-ready SEO insights at scale.

Pricing

from $4.00 / 1,000 results

Rating

0.0

(0)

Developer

太郎 山田

太郎 山田

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

1

Monthly active users

21 hours ago

Last modified

Share

Analyze sitemap.xml files for structure, freshness, broken URLs, and crawl-ready SEO insights at scale.

Store Quickstart

  • Start with store-input.example.json to analyze one public sitemap with a small URL cap.
  • If that matches your SEO workflow, switch to store-input.templates.json and pick one of:
  • Quickstart (Dataset) for a fast structural audit
  • Large Site Audit for deeper coverage and status checks
  • Webhook Alert for change-driven monitoring

Key Features

  • 🗺️ Sitemap index support — Handles nested sitemap structures
  • 📊 Structure analysis — Top directories, depth distribution, file extensions
  • 📅 Update pattern detection — lastmod distribution, changefreq analysis
  • 🔗 Dead link checker — Optional HEAD request sampling to find broken URLs
  • 🏗️ Architecture insights — Understand site structure from sitemap alone
  • 📋 Bulk processing — Analyze multiple sitemaps per run

Use Cases

WhoWhy
SEO agenciesTechnical SEO audits — sitemap completeness and structure
Content strategistsIdentify content update patterns and stale pages
Web developersVerify sitemap structure before search engine submission
Competitive analystsMap competitor site architecture from public sitemaps

Input

FieldTypeDefaultDescription
sitemapUrlsarrayprefilledURLs of sitemap.xml files to analyze. Auto-discovers /sitemap.xml if you provide just a domain.
maxUrlsinteger5000Maximum number of URLs to process from each sitemap.
checkStatusbooleanfalseSend HEAD requests to check if URLs return 200. Slower but finds dead links.
deliverystring"dataset"How to deliver results. 'dataset' saves to Apify Dataset (recommended), 'webhook' sends to a URL.
webhookUrlstringWebhook URL to send results to (only used when delivery is 'webhook'). Works with Slack, Discord, or any HTTP endpoint.
concurrencyinteger3Maximum number of parallel requests. Higher = faster but may trigger rate limits.
dryRunbooleanfalseIf true, runs without saving results or sending webhooks. Useful for testing.

Input Example

{
"sitemapUrls": ["https://apify.com/sitemap.xml"],
"maxUrls": 5000,
"checkStatus": false,
"concurrency": 3
}

Output

FieldTypeDescription
metaobject
resultsarray
results[].sitemapUrlstring (url)
results[].finalUrlstring (url)
results[].statusstring
results[].analysisobject
results[].errornull
results[].checkedAttimestamp

Output Example

{
"sitemapUrl": "https://apify.com/sitemap.xml",
"status": "ok",
"analysis": {
"type": "urlset",
"totalUrls": 1247,
"structure": {
"topDirectories": [
{ "path": "/store", "count": 890, "percentage": 71 },
{ "path": "/blog", "count": 156, "percentage": 13 }
],
"depthDistribution": { "1": 45, "2": 890, "3": 312 }
},
"updateFrequency": {
"lastModRange": { "oldest": "2023-01-15", "newest": "2026-02-20" },
"urlsWithLastmod": 1100
}
}
}

API Usage

Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console → Settings → Integrations.

cURL

curl -X POST "https://api.apify.com/v2/acts/taroyamada~sitemap-analyzer/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "sitemapUrls": ["https://apify.com/sitemap.xml"], "maxUrls": 5000, "checkStatus": false, "concurrency": 3 }'

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("taroyamada/sitemap-analyzer").call(run_input={
"sitemapUrls": ["https://apify.com/sitemap.xml"],
"maxUrls": 5000,
"checkStatus": false,
"concurrency": 3
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)

JavaScript / Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('taroyamada/sitemap-analyzer').call({
"sitemapUrls": ["https://apify.com/sitemap.xml"],
"maxUrls": 5000,
"checkStatus": false,
"concurrency": 3
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Tips & Limitations

  • Keep concurrency ≤ 5 when auditing production sites to avoid WAF rate-limit triggers.
  • Use webhook delivery for recurring cron runs — push only deltas to downstream systems.
  • Enable dryRun for cheap validation before committing to a paid cron schedule.
  • Results are dataset-first; use Apify API run-sync-get-dataset-items for instant JSON in CI pipelines.
  • Run a tiny URL count first, review the sample, then scale up — pay-per-event means you only pay for what you use.

FAQ

Is there a rate limit?

Built-in concurrency throttling keeps requests polite. For most public APIs this actor can run 1–10 parallel requests without issues.

What happens when the input URL is unreachable?

The actor records an error row with the failure reason — successful URLs keep processing.

Can I schedule recurring runs?

Yes — use Apify Schedules to run this actor on a cron (hourly, daily, weekly). Combine with webhook delivery for change alerts.

Does this actor respect robots.txt?

Yes — requests use a standard User-Agent and honor site rate limits. For aggressive audits, set a higher concurrency only on your own properties.

Can I integrate with Google Sheets or Airtable?

Use webhook delivery with a Zapier/Make/n8n catcher, or call the Apify REST API from Apps Script / Airtable automations.

URL/Link Tools cluster — explore related Apify tools:

Cost

Pay Per Event:

  • actor-start: $0.01 (flat fee per run)
  • dataset-item: $0.003 per output item

Example: 1,000 items = $0.01 + (1,000 × $0.003) = $3.01

No subscription required — you only pay for what you use.