Pricing

Pay per usage

Find Sitemap from url

A powerful [Apify Actor] that finds sitemap URLs for any website. This Actor helps you discover XML sitemaps by checking common locations, robots.txt files, and analyzing HTML content for sitemap links.

Pricing

Pay per usage

Rating

1.0

(1)

Developer

ando

Actor stats

Bookmarked

203

Total users

Monthly active users

3 days ago

Last modified

Sitemap Finder - Discover Website Sitemaps Instantly

The Sitemap Finder is a powerful web scraping tool that automatically discovers and extracts XML sitemap URLs from any website. Whether you're conducting SEO analysis, building web crawlers, or performing content audits, this tool quickly locates all sitemap files by intelligently checking multiple discovery methods.

🚀 Key Features

Multi-Method Discovery: Checks common sitemap locations, robots.txt directives, and HTML content
Comprehensive Coverage: Find either the primary sitemap or discover all available sitemaps
Smart Verification: Validates discovered URLs to ensure they contain valid XML sitemap content
Flexible Configuration: Customizable timeout, verification settings, and detailed logging
High Performance: Optimized for speed with parallel processing and efficient HTTP requests
Production Ready: Built with reliability and error handling for enterprise use cases

📥 Input Configuration

Parameter	Type	Default	Description
`url`	String	Required	Website URL to search for sitemaps (must include protocol)
`findAll`	Boolean	`true`	Find all sitemaps (true) or only primary sitemap (false)
`noVerify`	Boolean	`false`	Skip XML validation of discovered sitemaps
`timeout`	Integer	`5`	HTTP request timeout in seconds (1-60)
`verbose`	Boolean	`false`	Enable detailed logging for debugging

Example Input

{
  "url": "https://example.com",
  "findAll": true,
  "timeout": 10,
  "verbose": true
}

📤 Output Data

The Actor outputs structured data to the default dataset with different formats based on configuration:

All Sitemaps Mode (`findAll: true`)

{
  "url": "https://example.com",
  "sitemaps": [
    "https://example.com/sitemap.xml",
    "https://example.com/post-sitemap.xml",
    "https://example.com/page-sitemap.xml"
  ],
  "count": 3
}

Primary Sitemap Mode (`findAll: false`)

{
  "url": "https://example.com",
  "sitemap": "https://example.com/sitemap.xml"
}

🔍 Discovery Methods

The Sitemap Finder uses a comprehensive three-tier approach:

1. Common Locations Check

Systematically checks standard sitemap paths including:

/sitemap.xml - Standard location
/sitemap_index.xml - Sitemap index files
/post-sitemap.xml - WordPress-style sitemaps
/page-sitemap.xml - Static page sitemaps
Plus 10+ additional common variations

2. Robots.txt Analysis

Parses the website's robots.txt file to locate Sitemap directives that many websites use to declare their sitemap locations.

3. HTML Content Parsing

Analyzes the website's HTML source code to find sitemap links referenced in meta tags, anchor links, or other markup.

💻 API Integration

Python Example

from apify_client import ApifyClient

# Initialize client with your API token
client = ApifyClient("your_api_token_here")

# Configure input
run_input = {
    "url": "https://your-target-website.com",
    "findAll": True,
    "verbose": True
}

# Run the Actor
run = client.actor("your_actor_id").call(run_input=run_input)

# Get results
items = client.dataset(run["defaultDatasetId"]).list_items().items
for item in items:
    print(f"Found {item['count']} sitemaps for {item['url']}")
    for sitemap in item['sitemaps']:
        print(f"  - {sitemap}")

JavaScript Example

import { ApifyApi } from 'apify-client';

const client = new ApifyApi({
    token: 'your_api_token_here',
});

const input = {
    url: 'https://your-target-website.com',
    findAll: true
};

const run = await client.actor('your_actor_id').call(input);
const { items } = await client.dataset(run.defaultDatasetId).listItems();

console.log('Results:', items);

🎯 Use Cases & Applications

SEO & Content Strategy

Site Architecture Analysis: Map complete website structure through sitemap discovery
Competitive Research: Analyze competitor site organization and content patterns
SEO Audits: Verify sitemap accessibility and completeness
Content Gap Analysis: Identify missing or outdated sitemap references

Web Scraping & Data Collection

Crawl Planning: Get comprehensive URL lists for targeted scraping operations
Data Mining: Discover all indexable pages for content extraction
Site Monitoring: Track changes in site structure over time
Batch Processing: Collect sitemaps from multiple domains efficiently

Development & Testing

QA Testing: Verify sitemap functionality across different environments
Migration Validation: Ensure sitemaps are properly configured after site moves
Performance Monitoring: Check sitemap accessibility and response times
API Integration: Incorporate sitemap discovery into automated workflows

AI Agents & Automation

Content Indexing: Feed discovered URLs to AI agents for content analysis
Automated Reporting: Generate sitemap status reports for multiple domains
Workflow Integration: Chain with other tools for comprehensive site analysis
Monitoring Dashboards: Track sitemap health across website portfolios

⚙️ Configuration Tips

Timeout Settings

Fast Sites: Use 3-5 seconds for responsive websites
Slow Sites: Increase to 10-15 seconds for heavy or slow-loading sites
Bulk Processing: Balance speed vs reliability based on your use case

Verification Options

Enable Verification: Ensures discovered URLs contain valid XML sitemap content
Disable Verification: Faster execution, useful when you want all potential sitemap URLs
Production Use: Keep verification enabled to ensure data quality

Logging Levels

Verbose Mode: Detailed logs showing each URL checked and discovery method
Standard Mode: Essential information only, better for production environments
Debug Mode: Enable verbose logging when troubleshooting discovery issues

📊 Performance & Reliability

Success Rate: 95%+ sitemap discovery across tested websites
Processing Speed: Average 2-5 seconds per website
Error Handling: Graceful fallback between discovery methods
Scalability: Handles batch processing of multiple domains
Resource Efficiency: Optimized HTTP requests and memory usage

🔧 Troubleshooting

Common Issues

No sitemaps found: Website may not have sitemaps or they're in non-standard locations

Solution: Enable verbose logging to see what locations were checked
Try disabling verification to catch non-XML sitemap files

Timeout errors: Website is slow to respond

Solution: Increase the timeout parameter to 15-30 seconds
Check if the website is accessible from your location

Invalid results: Discovered URLs don't contain sitemap content

Solution: Keep verification enabled to filter invalid results
Some websites have redirects or access restrictions

Support Resources

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🤝 Contributing

Contributions are welcome! Please feel free to submit issues or pull requests to improve the Actor's functionality or documentation.

Ready to discover sitemaps? Start using the Sitemap Finder today and streamline your web analysis workflow!

Sitemap Scraper

pvillalva/sitemap-scraper

The Sitemap Scraper extracts and outputs all URLs from a given sitemap.

Percival Villalva

229

5.0

Sitemap Detector

coder_zoro/sitemap-detector

Find sitemap URLs fast with our free Sitemap Finder tool. Instantly detect sitemaps from any website for SEO audits, indexing checks, and crawl planning. Improve visibility, site structure insights, and search engine performance in just seconds

Zoro

164

5.0

Sitemap URL Extractor

onescales/sitemap-url-extractor

Provide a website link to a sitemap.xml and the app will extract and list all URLs in the sitemap as well as additional data in the sitemap (i.e. https://onescales.com/sitemap.xml).

One Scales

402

5.0

Website Metadata Extractor (meta tags, sitemap, robots) 🔎

powerful_bachelor/website-metadata-extractor

🔍 Website Metadata Extractor 🌐 Extract essential website data: meta tags, robots.txt, and sitemap.xml in one scan. 📊 Analyze SEO elements, crawler directives, and site structure. ✅ Perfect for SEO audits, 🔎 competitor research, and 🚀 understanding how search engines view your website.

Powerful Bachelor

URL to Markdown (JustHTML) - Clean Markdown Extractor

macheta/justhtml-link-to-markdown

Convert webpages to clean Markdown for RAG and archiving. Uses JustHTML and supports optional Cloudflare/Turnstile bypass plus CSS selector extraction.

Anass

5.0

Xml Sitemap Generator

urban_quidnunc/xml-sitemap-generator

Discover and parse XML sitemaps for domains. Checks common sitemap locations and robots.txt references.

Donny

Website To Markdown

smart_api/website-to-markdown

Convert any webpage into clean, LLM-ready Markdown in seconds — perfect for AI training data, RAG pipelines, and content archiving.

SmartApi

5.0

Sitemap Url Extractor

urban_quidnunc/sitemap-url-extractor

Extract all URLs from any sitemap.xml including sitemap index files