URLs List - Extract ALL website urls avatar
URLs List - Extract ALL website urls

Pricing

Pay per event

Go to Apify Store
URLs List - Extract ALL website urls

URLs List - Extract ALL website urls

Automatically discovers and extracts ALL URLs from any website. Perfect for SEO analysis, content inventory, and bulk URL extraction from multiple websites. Get complete URL lists with metadata including last modified dates and priority levels.

Pricing

Pay per event

Rating

5.0

(1)

Developer

Lofomachines

Lofomachines

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

2

Monthly active users

a day ago

Last modified

Share

πŸš€ Comprehensive URL Extractor for Apify

Automatically discovers and extracts ALL URLs from any website. Perfect for SEO audits, content analysis, and bulk URL extraction.

✨ Key Features

  • πŸ” Automatic Discovery: Intelligently finds all available URLs from any website
  • πŸ’¨ Fast & Efficient: Optimized for speed and performance
  • πŸ“¦ Bulk Processing: Process multiple websites simultaneously
  • 🏷️ Complete Metadata: Extracts URLs with last modified dates, priority levels, and update frequency
  • πŸ—œοΈ Smart Handling: Works with all standard web formats
  • πŸ”„ Recursive Discovery: Automatically discovers nested URL structures

🎯 How It Works

  1. Input: Provide one or more website URLs
  2. Discovery: The scraper intelligently discovers all available URL sources
  3. Extraction: Extracts all URLs with their metadata
  4. Output: Returns a complete JSON with all data

πŸ“₯ Input

{
"startUrls": [
{ "url": "https://apify.com" },
{ "url": "https://crawlee.dev" },
{ "url": "https://example.com" }
],
"proxyConfiguration": {
"useApifyProxy": true
}
}

Parameters

  • startUrls (required): Array of website URLs to process
  • proxyConfiguration (optional): Proxy configuration to avoid blocking

πŸ“€ Output

Each extracted URL contains:

{
"source_url": "https://apify.com",
"url": "https://apify.com/store/scrapers",
"lastmod": "2024-11-14",
"priority": "0.8",
"changefreq": "weekly"
}

Output Fields

  • source_url: The input website URL (useful for bulk processing)
  • url: The extracted URL
  • lastmod: Last modification date (when available)
  • priority: Page priority (0.0-1.0, when available)
  • changefreq: Update frequency (when available)

πŸ’‘ Use Cases

SEO Audit

Extract all URLs from a website for complete SEO analysis

Content Inventory

Create a comprehensive inventory of all pages on a website

Monitoring

Track changes over time using the lastmod field

Bulk Analysis

Analyze the structure of multiple websites simultaneously

Data Pipeline

Use extracted URLs as input for other scrapers

🚦 Best Practices

  1. Use Proxies: Enable Apify proxies for large websites or multiple sites
  2. Bulk Processing: Add multiple URLs in the input to process them in one run
  3. Filter Output: Use source_url to distinguish URLs from different sites
  4. Track Changes: Use the lastmod field to identify recent content

⚑ Performance

  • Fast and efficient processing
  • Handles large websites (50k+ URLs)
  • Automatic retry on temporary errors
  • Prevents infinite loops

πŸ› οΈ Technologies

  • Apify SDK: Platform integration
  • Crawlee: Web scraping framework
  • fast-xml-parser: Fast XML parsing
  • Node.js: Runtime environment

πŸ“ Notes

  • The scraper intelligently discovers all available URLs from websites
  • The source_url field allows tracking the origin of each URL in bulk processing
  • Works with all standard website structures

πŸ› Troubleshooting

Problem: No URLs found

  • Solution: Verify that the website is accessible and has a standard structure

Problem: Timeout on large sites

  • Solution: Enable Apify proxies to improve performance

Happy Scraping! πŸš€