Find Sitemap from url avatar

Find Sitemap from url

Pricing

Pay per usage

Go to Apify Store
Find Sitemap from url

Find Sitemap from url

A powerful [Apify Actor] that finds sitemap URLs for any website. This Actor helps you discover XML sitemaps by checking common locations, robots.txt files, and analyzing HTML content for sitemap links.

Pricing

Pay per usage

Rating

1.0

(1)

Developer

ando

ando

Maintained by Community

Actor stats

6

Bookmarked

203

Total users

15

Monthly active users

3 days ago

Last modified

Share

Sitemap Finder - Discover Website Sitemaps Instantly

Runs on Apify License: MIT

The Sitemap Finder is a powerful web scraping tool that automatically discovers and extracts XML sitemap URLs from any website. Whether you're conducting SEO analysis, building web crawlers, or performing content audits, this tool quickly locates all sitemap files by intelligently checking multiple discovery methods.

πŸš€ Key Features

  • Multi-Method Discovery: Checks common sitemap locations, robots.txt directives, and HTML content
  • Comprehensive Coverage: Find either the primary sitemap or discover all available sitemaps
  • Smart Verification: Validates discovered URLs to ensure they contain valid XML sitemap content
  • Flexible Configuration: Customizable timeout, verification settings, and detailed logging
  • High Performance: Optimized for speed with parallel processing and efficient HTTP requests
  • Production Ready: Built with reliability and error handling for enterprise use cases

πŸ“₯ Input Configuration

ParameterTypeDefaultDescription
urlStringRequiredWebsite URL to search for sitemaps (must include protocol)
findAllBooleantrueFind all sitemaps (true) or only primary sitemap (false)
noVerifyBooleanfalseSkip XML validation of discovered sitemaps
timeoutInteger5HTTP request timeout in seconds (1-60)
verboseBooleanfalseEnable detailed logging for debugging

Example Input

{
"url": "https://example.com",
"findAll": true,
"timeout": 10,
"verbose": true
}

πŸ“€ Output Data

The Actor outputs structured data to the default dataset with different formats based on configuration:

All Sitemaps Mode (findAll: true)

{
"url": "https://example.com",
"sitemaps": [
"https://example.com/sitemap.xml",
"https://example.com/post-sitemap.xml",
"https://example.com/page-sitemap.xml"
],
"count": 3
}

Primary Sitemap Mode (findAll: false)

{
"url": "https://example.com",
"sitemap": "https://example.com/sitemap.xml"
}

πŸ” Discovery Methods

The Sitemap Finder uses a comprehensive three-tier approach:

1. Common Locations Check

Systematically checks standard sitemap paths including:

  • /sitemap.xml - Standard location
  • /sitemap_index.xml - Sitemap index files
  • /post-sitemap.xml - WordPress-style sitemaps
  • /page-sitemap.xml - Static page sitemaps
  • Plus 10+ additional common variations

2. Robots.txt Analysis

Parses the website's robots.txt file to locate Sitemap directives that many websites use to declare their sitemap locations.

3. HTML Content Parsing

Analyzes the website's HTML source code to find sitemap links referenced in meta tags, anchor links, or other markup.

πŸ’» API Integration

Python Example

from apify_client import ApifyClient
# Initialize client with your API token
client = ApifyClient("your_api_token_here")
# Configure input
run_input = {
"url": "https://your-target-website.com",
"findAll": True,
"verbose": True
}
# Run the Actor
run = client.actor("your_actor_id").call(run_input=run_input)
# Get results
items = client.dataset(run["defaultDatasetId"]).list_items().items
for item in items:
print(f"Found {item['count']} sitemaps for {item['url']}")
for sitemap in item['sitemaps']:
print(f" - {sitemap}")

JavaScript Example

import { ApifyApi } from 'apify-client';
const client = new ApifyApi({
token: 'your_api_token_here',
});
const input = {
url: 'https://your-target-website.com',
findAll: true
};
const run = await client.actor('your_actor_id').call(input);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log('Results:', items);

🎯 Use Cases & Applications

SEO & Content Strategy

  • Site Architecture Analysis: Map complete website structure through sitemap discovery
  • Competitive Research: Analyze competitor site organization and content patterns
  • SEO Audits: Verify sitemap accessibility and completeness
  • Content Gap Analysis: Identify missing or outdated sitemap references

Web Scraping & Data Collection

  • Crawl Planning: Get comprehensive URL lists for targeted scraping operations
  • Data Mining: Discover all indexable pages for content extraction
  • Site Monitoring: Track changes in site structure over time
  • Batch Processing: Collect sitemaps from multiple domains efficiently

Development & Testing

  • QA Testing: Verify sitemap functionality across different environments
  • Migration Validation: Ensure sitemaps are properly configured after site moves
  • Performance Monitoring: Check sitemap accessibility and response times
  • API Integration: Incorporate sitemap discovery into automated workflows

AI Agents & Automation

  • Content Indexing: Feed discovered URLs to AI agents for content analysis
  • Automated Reporting: Generate sitemap status reports for multiple domains
  • Workflow Integration: Chain with other tools for comprehensive site analysis
  • Monitoring Dashboards: Track sitemap health across website portfolios

βš™οΈ Configuration Tips

Timeout Settings

  • Fast Sites: Use 3-5 seconds for responsive websites
  • Slow Sites: Increase to 10-15 seconds for heavy or slow-loading sites
  • Bulk Processing: Balance speed vs reliability based on your use case

Verification Options

  • Enable Verification: Ensures discovered URLs contain valid XML sitemap content
  • Disable Verification: Faster execution, useful when you want all potential sitemap URLs
  • Production Use: Keep verification enabled to ensure data quality

Logging Levels

  • Verbose Mode: Detailed logs showing each URL checked and discovery method
  • Standard Mode: Essential information only, better for production environments
  • Debug Mode: Enable verbose logging when troubleshooting discovery issues

πŸ“Š Performance & Reliability

  • Success Rate: 95%+ sitemap discovery across tested websites
  • Processing Speed: Average 2-5 seconds per website
  • Error Handling: Graceful fallback between discovery methods
  • Scalability: Handles batch processing of multiple domains
  • Resource Efficiency: Optimized HTTP requests and memory usage

πŸ”§ Troubleshooting

Common Issues

No sitemaps found: Website may not have sitemaps or they're in non-standard locations

  • Solution: Enable verbose logging to see what locations were checked
  • Try disabling verification to catch non-XML sitemap files

Timeout errors: Website is slow to respond

  • Solution: Increase the timeout parameter to 15-30 seconds
  • Check if the website is accessible from your location

Invalid results: Discovered URLs don't contain sitemap content

  • Solution: Keep verification enabled to filter invalid results
  • Some websites have redirects or access restrictions

Support Resources

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

🀝 Contributing

Contributions are welcome! Please feel free to submit issues or pull requests to improve the Actor's functionality or documentation.


Ready to discover sitemaps? Start using the Sitemap Finder today and streamline your web analysis workflow!