URLs List - Extract ALL website urls
Pricing
from $0.20 / 1,000 results
URLs List - Extract ALL website urls
Automatically discovers and extracts ALL URLs from any website. Perfect for SEO analysis, content inventory, and bulk URL extraction from multiple websites. Get complete URL lists with metadata including last modified dates and priority levels.
Pricing
from $0.20 / 1,000 results
Rating
5.0
(1)
Developer
Lofomachines
Actor stats
4
Bookmarked
89
Total users
16
Monthly active users
8 days ago
Last modified
Categories
Share
URLs List — Extract ALL Website URLs
The fastest way to extract every URL from any website. Designed for SEO audits, site migrations, content inventories, and large-scale data pipelines. Feed it one or more domains and get back a complete, structured URL list in minutes.
✨ Key Features
| Feature | Description | |
|---|---|---|
| 🔍 | Automatic URL Discovery | Crawls recursively to find all available URLs from any website structure |
| 💨 | Fast & Efficient | Optimized for speed — handles large sites with 50,000+ URLs |
| 📦 | Bulk Domain Processing | Accepts multiple start URLs to process several domains simultaneously |
| 🏷️ | Rich Metadata Extraction | Retrieves last modified dates, priority levels, and update frequency where available |
| 🗜️ | Smart Crawling | Works with XML sitemaps, recursive crawling, and standard web formats |
| 🛡️ | Resilient & Reliable | Automatic retries on errors with infinite loop prevention |
| 🎯 | Result Limiting | Control output size via maxResults or use returnAll for full extraction |
| 🔎 | Keyword URL Filtering | Return only URLs that contain all specified keywords — great for scoped scraping |
🎯 Use Cases
| Use Case | Description |
|---|---|
| SEO Audit | Extract all URLs to analyze site architecture and identify orphan pages |
| Content Inventory | Build a comprehensive page list for migration planning or content audits |
| Change Monitoring | Track lastmod dates to identify recently updated content |
| Data Pipelines | Feed extracted URLs into downstream scrapers (HTML scraper, Google Sheets, etc.) |
| Targeted Scraping | Use keyword filters to scope extraction to specific sections (e.g., /blog/, /product/) |
| Site Sampling | Use maxResults to grab a quick URL sample without crawling the full site |
💰 Cost of Usage
This actor is intentionally lightweight. It parses URL structures without rendering JavaScript unless strictly necessary, keeping Apify platform unit consumption low.
| Site Size | Estimated Cost |
|---|---|
| Small (< 1,000 URLs) | Cents per run |
| Medium (~10,000 URLs) | Typically under $1.00 |
| Large (50,000+ URLs) | Scales efficiently; depends on site architecture complexity |
💡 Tip: Enable Apify Proxy (on by default) to ensure consistent access and avoid IP-based blocking on larger sites.
📥 Input Configuration
The actor accepts a JSON input object. Only startUrls is required.
Minimal Input
{"startUrls": [{ "url": "https://example.com" }]}
Full Input Example
{"startUrls": [{ "url": "https://apify.com" },{ "url": "https://crawlee.dev" }],"proxyConfiguration": {"useApifyProxy": true},"returnAll": true,"maxResults": 1000,"keywords": ["blog", "article"]}
Input Parameters Reference
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
startUrls | Array | ✅ Yes | [{ "url": "https://apify.com" }] | One or more website URLs to start extraction from |
proxyConfiguration | Object | ❌ No | { "useApifyProxy": false } | Proxy settings for reliable, unblocked access |
returnAll | Boolean | ❌ No | true | When true, extracts all URLs and ignores maxResults |
maxResults | Integer | ❌ No | 1000 | Maximum URLs to return. Ignored when returnAll is true or set to 0 |
keywords | Array | ❌ No | [] | Case-insensitive keyword filter — only URLs containing all listed keywords are returned (e.g., ["blog"] → only /blog/... paths) |
📤 Output
Results are stored in the Apify dataset and can be exported as JSON, CSV, XML, or Excel directly from the Apify platform.
Each item in the dataset includes the extracted URL and any available metadata such as lastmod, priority, and changefreq.
🛠️ Troubleshooting
No URLs returned?
- Verify that the
startUrlsare publicly accessible and not behind authentication. - Check if the site uses JavaScript rendering — some SPAs may require additional configuration.
Fewer URLs than expected?
- The site's sitemap may be incomplete. The actor also performs recursive crawling, but some URLs may be excluded via
robots.txt. - If
returnAllisfalse, make suremaxResultsis set high enough.
Getting blocked?
- Enable
proxyConfigurationwithuseApifyProxy: trueto route requests through Apify's proxy network.
🔗 Build a Complete Website Intelligence Pipeline
This actor is the first step. Once you have your full URL list, power up your analysis with these actors:
| Actor | What it adds |
|---|---|
| GEO Audit — AI Search Optimization Checker | Feed extracted URLs to score each page's visibility in ChatGPT, Perplexity, and Gemini |
| Website Tech Profiler | Detect the full technology stack behind any extracted URL — CMS, frameworks, analytics, CDN |
| Organization Registered Domain & Subdomain Scraper | Discover all domains owned by a target organization before running URL extraction |
| Website API & Endpoint Analyzer | Uncover hidden API endpoints behind any page in your extracted URL list |