Sitemap Structure Analyzer
Pricing
Pay per usage
Sitemap Structure Analyzer
Analyze any website's sitemap in seconds using sitemap.xml data. Get URL counts by type (product, blog, docs), content freshness, URL patterns, and SEO anomalies — no page fetching required.
Pricing
Pay per usage
Rating
5.0
(2)
Developer
One Scales
Maintained by CommunityActor stats
3
Bookmarked
5
Total users
4
Monthly active users
4 days ago
Last modified
Categories
Share
Sitemap Structure Analyzer tells you what a website is made of — without fetching a single page. Point it at any domain and get back a full breakdown: how many products, blog, docs pages, and utility URLs the site has; what URL templates drive it; how fresh the content is; and where the anomalies are.
Works on sites from 50 URLs to hundreds of thousands of URLs. Runs in seconds. No page fetching means no rate limits, no IP blocks, and no scraping legality concerns — just pure structural analysis of public sitemap data.
Pairs naturally with the Sitemap URL Extractor (get the raw URLs) and the Bulk AI Markdown Maker (scrape only the pages you actually need).
Use cases include:
- SEO audits — surface stale content, utility URLs that shouldn't be indexed, and low-lastmod coverage across entire sites
- Competitive research — understand a competitor's content shape before investing in a content strategy
- AI / RAG pipeline building — identify exactly what URL types and sections to include before scraping
- Agency prospecting — bulk-check potential client sites for content mix and content freshness
- Content strategy benchmarking — compare your site's product/blog/docs ratio against competitors
- Technical SEO QA — detect account, cart, and search filter URLs appearing in sitemaps where they shouldn't
Features
- Content type classification — every URL labeled as one of:
product,blog,documentation,profile,category,page,media,other(utility/auth/search), orunclassified - Site archetype detection — labels the site as
ecommerce,content,documentation,community,marketing, orgeneralbased on its dominant content type - Docs-context awareness — sites on
docs.,developer.,api.,support.,help.,learn.,wiki.,kb., orknowledgebase.subdomains, or with a meaningful share of/docs/-style paths, get a smarter classification pass that recognizes modern documentation platforms (Mintlify, Docusaurus, Nextra, GitBook) - URL pattern detection — groups URLs into templates (e.g.
/products/{slug}) with counts, dominant classification, and example URLs. Patterns with only one URL are suppressed - Freshness analysis — lastmod coverage, newest/oldest URLs, content velocity (30/90/365-day windows), stale URL counts (1+/2+/3+ years), posting cadence by section
- Anomaly detection — flags utility URLs in sitemaps, stale content concentration, low lastmod coverage, and silent zero-result sitemaps
- Sitemap index support — automatically fetches and recurses through all child sitemaps in a sitemap index, with cycle protection
- Proxy support — residential proxy by default with automatic no-proxy fallback for small sites with bot protection that blocks proxied requests
- Budget capping — caps domains processed to stay within your configured budget
How to Use
Input
| Field | Type | Required | Description |
|---|---|---|---|
domains | String list | Yes | Domains to analyze. Accepts example.com, https://example.com, www.example.com, or a direct sitemap URL like https://example.com/sitemap.xml. |
maxUrls | Integer | No | Cap on URL processing per domain. 0 = no cap. Useful for very large sites. |
proxyConfiguration | Object | No | Proxy settings. Residential proxy recommended and set by default. For sites with Cloudflare/WAF bot protection, pinning the proxy to a specific country (e.g. US) often improves reliability. |
Example input:
{"domains": ["onescales.com", "shopify.com"],"maxUrls": 0,"proxyConfiguration": {"useApifyProxy": true,"apifyProxyGroups": ["RESIDENTIAL"]}}
Output
One row per domain. Every row includes:
| Field | Description |
|---|---|
domain | Analyzed domain |
sitemapUrl | Sitemap URL that was used |
sitemapType | index (sitemap index) or urlset (single sitemap) |
childSitemaps | Child sitemap URLs (only present when sitemapType is index) |
analyzedAt | ISO timestamp |
error | Error message if analysis failed |
summary | Total URLs, classified/unclassified counts, classification coverage, site archetype |
byType | URL counts and percentages per content type |
bySection | URL counts per top-level path prefix |
urlPatterns | Detected URL templates with count, classification, and examples (top 20, minimum 2 URLs each) |
freshness | lastmod coverage, content velocity, stale URL breakdown, posting cadence by section |
anomalies | Detected anomalies with severity, count, and description |
Example output row (Shopify ecommerce site):
{"domain": "onescales.com","sitemapUrl": "https://www.onescales.com/sitemap.xml","sitemapType": "index","analyzedAt": "2026-05-19T10:23:11.940Z","summary": {"totalUrls": 464,"classified": 464,"unclassified": 0,"classificationCoverage": 1,"siteArchetype": "ecommerce"},"byType": {"product": { "count": 239, "percentage": 51.5 },"blog": { "count": 96, "percentage": 20.7 },"category": { "count": 63, "percentage": 13.6 },"page": { "count": 66, "percentage": 14.2 },"documentation": { "count": 0, "percentage": 0 },"profile": { "count": 0, "percentage": 0 },"media": { "count": 0, "percentage": 0 },"other": { "count": 0, "percentage": 0 },"unclassified": { "count": 0, "percentage": 0 }},"bySection": {"/products/": 239,"/blog/": 98,"/pages/": 64,"/collections/": 61},"urlPatterns": [{"pattern": "/products/{slug}","count": 239,"classification": "product","examples": ["/products/table", "/products/the-perfect-day"]},{"pattern": "/blog/{slug}","count": 98,"classification": "blog","examples": ["/blog/resources/are-we-here", "/blog/resources/bank-look"]}],"freshness": {"lastmodCoverage": 1,"newestUrlLastmod": "2026-05-19","oldestUrlLastmod": "2019-02-12","contentVelocity": {"urlsModifiedLast30Days": 301,"urlsModifiedLast90Days": 302,"urlsModifiedLast365Days": 309},"staleUrls": {"olderThan1Year": 153,"olderThan2Years": 115,"olderThan3Years": 102},"postingCadenceBySection": {"/products/": "approx 19.9 updates per month over last 12 months","/collections/": "approx 5.1 updates per month over last 12 months","/blog/": "approx 0.3 updates per month over last 12 months"}},"anomalies": [{"type": "stale_content_concentration","severity": "low","count": 102,"description": "102 URLs have not been modified in over 3 years. Consider auditing for relevance."}]}
Anomaly Types
| Type | Triggered When | Severity |
|---|---|---|
other_urls_in_sitemap | Utility URLs (auth, cart, search filters) appear in the sitemap and typically shouldn't be indexed | medium (≤20 URLs) or high (>20) |
stale_content_concentration | More than 50 URLs haven't been modified in over 3 years | low (≤500) or high (>500) |
low_lastmod_coverage | Fewer than 30% of URLs have a lastmod date (on sites with 50+ URLs) | low |
sitemap_returned_no_entries | Sitemap was fetched but no URLs were extracted (proxy blocking, parse failure, or all child sitemaps empty) | high |
Tips
- Sitemap index sites — for large sites with a sitemap index, the actor automatically fetches and aggregates all child sitemaps
- Large sites — use
maxUrlsto cap processing and control costs on sites with 100,000+ URLs - No sitemap found — the actor checks
robots.txtfirst, then falls back to/sitemap.xmland/sitemap_index.xml. If none work, the row will contain an error message - Sites with bot protection — small WordPress sites and Cloudflare-fronted docs sites sometimes block residential proxies. The actor automatically retries without the proxy when an XML fetch returns non-XML. If a domain still fails, try setting the proxy to a country-specific residential group (e.g.
RESIDENTIALpinned toUS) - Direct sitemap URLs — you can pass a full sitemap URL like
https://example.com/sitemaps/posts.xmlto skip discovery and analyze only that sitemap
Support
For bugs, feature requests, or questions — reach us at https://docs.google.com/forms/d/e/1FAIpQLSfsKyzZ3nRED7mML47I4LAfNh_mBwkuFMp1FgYYJ4AkDRgaRw/viewform?usp=dialog
Related Keywords
sitemap analyzer, sitemap structure, sitemap data, sitemap.xml data, sitemap analysis, website structure analyzer, content type classifier, URL classifier, SEO audit, site audit, sitemap extractor, content velocity, sitemap freshness, stale content, URL pattern analyzer, competitive research, RAG pipeline, AI dataset builder, site architecture, content shape, bulk sitemap analyzer, sitemap index, sitemap structure, docs site analyzer, documentation analyzer, actor, AI, API, apify, at scale, automated, automation, batch, bulk, checker, crawler, CSV, dataset, detector, Excel, export, extractor, finder, generator, Google Sheets, JSON, lookup, make, make.com, MCP, n8n, no-code, no API key required, parser, pipeline, report, scanner, schedule, scheduled, scraper, spreadsheet, tool, validator, webhook, workflow, XML, zapier