URLs List - Extract ALL website urls
Pricing
from $0.20 / 1,000 results
URLs List - Extract ALL website urls
Automatically discovers and extracts ALL URLs from any website. Perfect for SEO analysis, content inventory, and bulk URL extraction from multiple websites. Get complete URL lists with metadata including last modified dates and priority levels.
Pricing
from $0.20 / 1,000 results
Rating
5.0
(2)
Developer

Lofomachines
Actor stats
1
Bookmarked
25
Total users
8
Monthly active users
12 days ago
Last modified
Categories
Share
Comprehensive URL Extractor for SEO audits, content inventory, and bulk analysis.
Features • Cost of Usage • Input • Output • Troubleshooting
This actor automatically discovers and extracts ALL URLs from any target website. It is designed to be the entry point for SEO audits, site migrations, and content analysis pipelines. It crawls recursively to build a complete map of a domain.
✨ Key Features
- 🔍 Automatic Discovery: Intelligently finds all available URLs from any website structure.
- 💨 Fast & Efficient: Optimized for speed to handle large sites (50k+ URLs).
- 📦 Bulk Processing: Accepts multiple domain roots to process simultaneously.
- 🏷️ Rich Metadata: Extracts last modified dates, priority levels, and update frequency (where available).
- 🗜️ Smart Handling: Works with standard sitemaps, recursive crawling, and standard web formats.
- 🛡️ Resilient: Automatic retries on temporary errors and infinite loop prevention.
- 🎯 Result Limiting: Control the maximum number of URLs extracted with
maxResultsor enablereturnAllfor complete extraction. - 🔎 Keyword Filtering: Filter URLs by keywords - only URLs containing all specified keywords will be returned.
🎯 Use Cases
| Use Case | Description |
|---|---|
| SEO Audit | Extract all URLs to analyze site architecture and identify orphan pages. |
| Content Inventory | Create a comprehensive list of all existing pages for migration planning. |
| Monitoring | Track lastmod dates to identify which content has been updated recently. |
| Data Pipelines | Feed the output URLs into other scrapers (e.g., Scrape HTML, Google Sheets export). |
| Targeted Extraction | Use keyword filtering to extract only specific sections (e.g., all blog posts, product pages). |
| Sampling | Use maxResults to extract a sample of URLs for quick analysis without processing entire sites. |
💰 Cost of Usage
This scraper is designed to be lightweight. It parses URL structures without rendering full page JavaScript (unless necessary), keeping costs low.
- Small Sites (< 1,000 URLs): Cents per run.
- Medium Sites (10,000 URLs): Typically < $1.00.
- Large Sites: Efficiency scales well, but usage depends on the complexity of the target site's architecture.
Tip: Always use Apify Proxy (enabled by default) to ensure consistent access and avoid blocking.
📥 Input Configuration
The Actor expects a JSON input defining the websites to scan.
Example Input
{"startUrls": [{ "url": "https://apify.com" },{ "url": "https://crawlee.dev" }],"proxyConfiguration": {"useApifyProxy": true},"returnAll": true,"maxResults": 1000,"keywords": ["blog", "article"]}
Input Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| startUrls | Array | ✅ Yes | [{ url: "https://apify.com" }] | List of website URLs to extract pages from. |
| proxyConfiguration | Object | ❌ No | { useApifyProxy: false } | Proxy settings for reliable access. |
| returnAll | Boolean | ❌ No | true | If true, extracts all available URLs regardless of maxResults. If false, applies the maxResults limit. |
| maxResults | Integer | ❌ No | 1000 | Maximum number of URLs to extract. Ignored if returnAll is true or set to 0. |
| keywords | Array | ❌ No | [] | Filter URLs to only include those containing ALL specified keywords. Case-insensitive matching. Example: ["blog"] returns only URLs containing "blog" (e.g., https://example.com/blog/article). |