Sitemap Sniffer avatar

Sitemap Sniffer

Pricing

from $0.90 / 1,000 discovered sitemap items

Go to Apify Store
Sitemap Sniffer

Sitemap Sniffer

Find sitemap files from website roots, domains, robots.txt, and direct sitemap URLs. Export sitemap metadata, URL counts, nested index depth, and optional URL inventory rows.

Pricing

from $0.90 / 1,000 discovered sitemap items

Rating

0.0

(0)

Developer

Maxime Dupré

Maxime Dupré

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

🗺️ Sitemap sniffer for SEO audits

Sitemap Sniffer finds public sitemap files for websites, domains, robots.txt files, direct sitemap URLs, and sitemap indexes. Use this sitemap sniffer when you need a quick SEO sitemap audit, a sitemap finder for multiple sites, or a sitemap URL extractor before a crawl.

Start with a public website such as apify.com, a bare domain such as example.com, or a known sitemap such as https://example.com/sitemap.xml. The Actor checks public sitemap sources, follows sitemap indexes when enabled, and saves clean output rows you can export from Apify or use through the API.

🔎 What this Actor does

  • Reads public robots.txt files and follows Sitemap: directives.
  • Checks common sitemap paths for website roots and bare domains.
  • Accepts direct sitemap, sitemap index, and robots.txt URLs.
  • Parses XML sitemap indexes, XML URL sets, plain-text sitemaps, and gzipped sitemap responses.
  • Follows nested sitemap indexes within your depth and output limits.
  • Saves one sitemap row per discovered sitemap file.
  • Optionally emits URL inventory rows from sitemap contents.
  • Adds one target summary row per submitted target, including no-sitemap outcomes.

This Actor is focused on public sitemap discovery. It does not crawl arbitrary internal links, scrape page content, check broken links, submit sitemaps to search engines, or validate whether URLs are indexed.

📦 Data you get

Each run can return three output types.

Sitemap rows describe discovered sitemap files:

  • sitemap URL, canonical URL, parent sitemap URL, and index depth
  • target website, normalized origin, and domain host
  • sitemap type, HTTP status, content type, byte count, and compression flag
  • URL count, child sitemap count, first lastmod, and discovery source
  • all discovery sources when the same sitemap is found more than one way

URL inventory rows are optional. When enabled, they include each URL found inside parsed sitemaps, the source sitemap URL, lastmod, changefreq, priority, and hreflang alternates when the sitemap provides them.

Target summary rows make batch runs easier to filter. They report whether each target was completed, skipped, or produced no public sitemap files.

🚀 How to run it

  1. Add one or more website or sitemap targets.
  2. Keep sitemap index following enabled for normal SEO audits.
  3. Leave URL inventory rows off for a fast sitemap-file audit.
  4. Turn on URL inventory rows when you want the URLs listed inside the sitemaps.
  5. Set sitemap and URL row limits to control output size and cost.
  6. Run the Actor and open the dataset overview.

No cookies, login, source API key, or proxy settings are needed from you. The target must expose public sitemap assets over http or https.

⚙️ Input example

{
"targets": [
"https://apify.com",
"example.com",
"https://example.com/sitemap.xml"
],
"followSitemapIndexes": true,
"maxIndexDepth": 1,
"parseSitemapDetails": true,
"emitUrlRows": false,
"maxSitemapRows": 10,
"maxUrlRows": 10000
}

Website or sitemap targets is the only required input. You can paste roots, bare domains, robots.txt URLs, sitemap URLs, or sitemap index URLs in the same list.

Use Follow sitemap indexes and Maximum sitemap index depth to control nested index expansion. Use Parse sitemap details when you want counts, type, size, compression, and URL metadata. Use Emit URL inventory rows only when you want individual URLs from the sitemaps in the dataset.

🧾 Output example

{
"recordType": "sitemap",
"target": "https://apify.com",
"targetIndex": 0,
"normalizedOrigin": "https://apify.com",
"domainHost": "apify.com",
"url": "https://apify.com/sitemap.xml",
"canonicalUrl": "https://apify.com/sitemap.xml",
"type": "sitemap_index",
"httpStatus": 200,
"contentType": "application/xml",
"byteCount": 1240,
"urlCount": 0,
"childSitemapCount": 8,
"isCompressed": false,
"lastmod": "2026-06-01",
"discoveredVia": "robots.txt",
"discoverySources": ["robots.txt"],
"parentSitemapUrl": null,
"depth": 0,
"scrapedAt": "2026-06-15T12:00:00.000Z"
}

When URL inventory is enabled, URL rows use recordType: "url" and include url, sitemapUrl, lastmod, changefreq, priority, and hreflang when available.

💳 Pricing

Sitemap Sniffer uses pay-per-event pricing. One charged event is one discovered sitemap item, URL inventory item, or target summary saved by the run.

Keep URL inventory rows off when you only need sitemap-file metadata. Turn them on when you need a larger URL export for crawl planning, migrations, RAG source lists, or SEO checks.

⚠️ Limits and caveats

  • Sitemap files must be publicly reachable.
  • Some websites do not publish sitemap files, or publish them only for selected sections.
  • Very large sitemap indexes can create many child sitemap or URL rows, so use the row limits for predictable output.
  • Sitemap metadata is only as complete as the source file. Missing lastmod, changefreq, priority, or hreflang values are not guessed.
  • This Actor reports public sitemap assets. It does not prove that search engines have indexed the URLs.

❓ FAQ

🔐 Do I need login credentials or an API key?

No. This Actor reads public sitemap assets. You do not need to provide cookies, login credentials, a source API key, or proxy settings.

🧭 Can it crawl my whole website?

No. Use this Actor to discover sitemap files and, optionally, the URLs listed inside those sitemap files. For rendered page crawling and link maps, use Website URL Crawler.

🧩 Can I submit more than one website?

Yes. Add multiple targets to the same run. The output keeps target and targetIndex fields so you can filter each website separately.

📄 Why did I get a target summary but no sitemap rows?

That usually means the target did not expose a public sitemap through robots.txt, common sitemap paths, or the direct URL you submitted. The run still completes so you can audit batches without one empty target failing the whole job.

📝 Changelog

  • 0.1: Initial release.

🆘 Support

For issues, questions, or feature requests, file a ticket and I'll fix or implement it in less than 24h 🫡

🔗 Other actors

Made with ❤️ by Maxime Dupré