Find Sitemap from url
Pricing
Pay per usage
Find Sitemap from url
A powerful [Apify Actor] that finds sitemap URLs for any website. This Actor helps you discover XML sitemaps by checking common locations, robots.txt files, and analyzing HTML content for sitemap links.
Pricing
Pay per usage
Rating
1.0
(1)
Developer

ando
Actor stats
6
Bookmarked
203
Total users
15
Monthly active users
3 days ago
Last modified
Categories
Share
Sitemap Finder - Discover Website Sitemaps Instantly
The Sitemap Finder is a powerful web scraping tool that automatically discovers and extracts XML sitemap URLs from any website. Whether you're conducting SEO analysis, building web crawlers, or performing content audits, this tool quickly locates all sitemap files by intelligently checking multiple discovery methods.
π Key Features
- Multi-Method Discovery: Checks common sitemap locations, robots.txt directives, and HTML content
- Comprehensive Coverage: Find either the primary sitemap or discover all available sitemaps
- Smart Verification: Validates discovered URLs to ensure they contain valid XML sitemap content
- Flexible Configuration: Customizable timeout, verification settings, and detailed logging
- High Performance: Optimized for speed with parallel processing and efficient HTTP requests
- Production Ready: Built with reliability and error handling for enterprise use cases
π₯ Input Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
url | String | Required | Website URL to search for sitemaps (must include protocol) |
findAll | Boolean | true | Find all sitemaps (true) or only primary sitemap (false) |
noVerify | Boolean | false | Skip XML validation of discovered sitemaps |
timeout | Integer | 5 | HTTP request timeout in seconds (1-60) |
verbose | Boolean | false | Enable detailed logging for debugging |
Example Input
{"url": "https://example.com","findAll": true,"timeout": 10,"verbose": true}
π€ Output Data
The Actor outputs structured data to the default dataset with different formats based on configuration:
All Sitemaps Mode (findAll: true)
{"url": "https://example.com","sitemaps": ["https://example.com/sitemap.xml","https://example.com/post-sitemap.xml","https://example.com/page-sitemap.xml"],"count": 3}
Primary Sitemap Mode (findAll: false)
{"url": "https://example.com","sitemap": "https://example.com/sitemap.xml"}
π Discovery Methods
The Sitemap Finder uses a comprehensive three-tier approach:
1. Common Locations Check
Systematically checks standard sitemap paths including:
/sitemap.xml- Standard location/sitemap_index.xml- Sitemap index files/post-sitemap.xml- WordPress-style sitemaps/page-sitemap.xml- Static page sitemaps- Plus 10+ additional common variations
2. Robots.txt Analysis
Parses the website's robots.txt file to locate Sitemap directives that many websites use to declare their sitemap locations.
3. HTML Content Parsing
Analyzes the website's HTML source code to find sitemap links referenced in meta tags, anchor links, or other markup.
π» API Integration
Python Example
from apify_client import ApifyClient# Initialize client with your API tokenclient = ApifyClient("your_api_token_here")# Configure inputrun_input = {"url": "https://your-target-website.com","findAll": True,"verbose": True}# Run the Actorrun = client.actor("your_actor_id").call(run_input=run_input)# Get resultsitems = client.dataset(run["defaultDatasetId"]).list_items().itemsfor item in items:print(f"Found {item['count']} sitemaps for {item['url']}")for sitemap in item['sitemaps']:print(f" - {sitemap}")
JavaScript Example
import { ApifyApi } from 'apify-client';const client = new ApifyApi({token: 'your_api_token_here',});const input = {url: 'https://your-target-website.com',findAll: true};const run = await client.actor('your_actor_id').call(input);const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log('Results:', items);
π― Use Cases & Applications
SEO & Content Strategy
- Site Architecture Analysis: Map complete website structure through sitemap discovery
- Competitive Research: Analyze competitor site organization and content patterns
- SEO Audits: Verify sitemap accessibility and completeness
- Content Gap Analysis: Identify missing or outdated sitemap references
Web Scraping & Data Collection
- Crawl Planning: Get comprehensive URL lists for targeted scraping operations
- Data Mining: Discover all indexable pages for content extraction
- Site Monitoring: Track changes in site structure over time
- Batch Processing: Collect sitemaps from multiple domains efficiently
Development & Testing
- QA Testing: Verify sitemap functionality across different environments
- Migration Validation: Ensure sitemaps are properly configured after site moves
- Performance Monitoring: Check sitemap accessibility and response times
- API Integration: Incorporate sitemap discovery into automated workflows
AI Agents & Automation
- Content Indexing: Feed discovered URLs to AI agents for content analysis
- Automated Reporting: Generate sitemap status reports for multiple domains
- Workflow Integration: Chain with other tools for comprehensive site analysis
- Monitoring Dashboards: Track sitemap health across website portfolios
βοΈ Configuration Tips
Timeout Settings
- Fast Sites: Use 3-5 seconds for responsive websites
- Slow Sites: Increase to 10-15 seconds for heavy or slow-loading sites
- Bulk Processing: Balance speed vs reliability based on your use case
Verification Options
- Enable Verification: Ensures discovered URLs contain valid XML sitemap content
- Disable Verification: Faster execution, useful when you want all potential sitemap URLs
- Production Use: Keep verification enabled to ensure data quality
Logging Levels
- Verbose Mode: Detailed logs showing each URL checked and discovery method
- Standard Mode: Essential information only, better for production environments
- Debug Mode: Enable verbose logging when troubleshooting discovery issues
π Performance & Reliability
- Success Rate: 95%+ sitemap discovery across tested websites
- Processing Speed: Average 2-5 seconds per website
- Error Handling: Graceful fallback between discovery methods
- Scalability: Handles batch processing of multiple domains
- Resource Efficiency: Optimized HTTP requests and memory usage
π§ Troubleshooting
Common Issues
No sitemaps found: Website may not have sitemaps or they're in non-standard locations
- Solution: Enable verbose logging to see what locations were checked
- Try disabling verification to catch non-XML sitemap files
Timeout errors: Website is slow to respond
- Solution: Increase the timeout parameter to 15-30 seconds
- Check if the website is accessible from your location
Invalid results: Discovered URLs don't contain sitemap content
- Solution: Keep verification enabled to filter invalid results
- Some websites have redirects or access restrictions
Support Resources
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π€ Contributing
Contributions are welcome! Please feel free to submit issues or pull requests to improve the Actor's functionality or documentation.
Ready to discover sitemaps? Start using the Sitemap Finder today and streamline your web analysis workflow!