URLs List - Extract ALL website urls
Pricing
Pay per event
URLs List - Extract ALL website urls
Automatically discovers and extracts ALL URLs from any website. Perfect for SEO analysis, content inventory, and bulk URL extraction from multiple websites. Get complete URL lists with metadata including last modified dates and priority levels.
Pricing
Pay per event
Rating
5.0
(1)
Developer

Lofomachines
Actor stats
0
Bookmarked
3
Total users
2
Monthly active users
a day ago
Last modified
Categories
Share
π Comprehensive URL Extractor for Apify
Automatically discovers and extracts ALL URLs from any website. Perfect for SEO audits, content analysis, and bulk URL extraction.
β¨ Key Features
- π Automatic Discovery: Intelligently finds all available URLs from any website
- π¨ Fast & Efficient: Optimized for speed and performance
- π¦ Bulk Processing: Process multiple websites simultaneously
- π·οΈ Complete Metadata: Extracts URLs with last modified dates, priority levels, and update frequency
- ποΈ Smart Handling: Works with all standard web formats
- π Recursive Discovery: Automatically discovers nested URL structures
π― How It Works
- Input: Provide one or more website URLs
- Discovery: The scraper intelligently discovers all available URL sources
- Extraction: Extracts all URLs with their metadata
- Output: Returns a complete JSON with all data
π₯ Input
{"startUrls": [{ "url": "https://apify.com" },{ "url": "https://crawlee.dev" },{ "url": "https://example.com" }],"proxyConfiguration": {"useApifyProxy": true}}
Parameters
- startUrls (required): Array of website URLs to process
- proxyConfiguration (optional): Proxy configuration to avoid blocking
π€ Output
Each extracted URL contains:
{"source_url": "https://apify.com","url": "https://apify.com/store/scrapers","lastmod": "2024-11-14","priority": "0.8","changefreq": "weekly"}
Output Fields
- source_url: The input website URL (useful for bulk processing)
- url: The extracted URL
- lastmod: Last modification date (when available)
- priority: Page priority (0.0-1.0, when available)
- changefreq: Update frequency (when available)
π‘ Use Cases
SEO Audit
Extract all URLs from a website for complete SEO analysis
Content Inventory
Create a comprehensive inventory of all pages on a website
Monitoring
Track changes over time using the lastmod field
Bulk Analysis
Analyze the structure of multiple websites simultaneously
Data Pipeline
Use extracted URLs as input for other scrapers
π¦ Best Practices
- Use Proxies: Enable Apify proxies for large websites or multiple sites
- Bulk Processing: Add multiple URLs in the input to process them in one run
- Filter Output: Use
source_urlto distinguish URLs from different sites - Track Changes: Use the
lastmodfield to identify recent content
β‘ Performance
- Fast and efficient processing
- Handles large websites (50k+ URLs)
- Automatic retry on temporary errors
- Prevents infinite loops
π οΈ Technologies
- Apify SDK: Platform integration
- Crawlee: Web scraping framework
- fast-xml-parser: Fast XML parsing
- Node.js: Runtime environment
π Notes
- The scraper intelligently discovers all available URLs from websites
- The
source_urlfield allows tracking the origin of each URL in bulk processing - Works with all standard website structures
π Troubleshooting
Problem: No URLs found
- Solution: Verify that the website is accessible and has a standard structure
Problem: Timeout on large sites
- Solution: Enable Apify proxies to improve performance
Happy Scraping! π