URLs List - Extract ALL website urls avatar

URLs List - Extract ALL website urls

Pricing

from $0.20 / 1,000 results

Go to Apify Store
URLs List - Extract ALL website urls

URLs List - Extract ALL website urls

Automatically discovers and extracts ALL URLs from any website. Perfect for SEO analysis, content inventory, and bulk URL extraction from multiple websites. Get complete URL lists with metadata including last modified dates and priority levels.

Pricing

from $0.20 / 1,000 results

Rating

5.0

(1)

Developer

Lofomachines

Lofomachines

Maintained by Community

Actor stats

4

Bookmarked

89

Total users

16

Monthly active users

8 days ago

Last modified

Share

URLs List — Extract ALL Website URLs

The fastest way to extract every URL from any website. Designed for SEO audits, site migrations, content inventories, and large-scale data pipelines. Feed it one or more domains and get back a complete, structured URL list in minutes.


✨ Key Features

FeatureDescription
🔍Automatic URL DiscoveryCrawls recursively to find all available URLs from any website structure
💨Fast & EfficientOptimized for speed — handles large sites with 50,000+ URLs
📦Bulk Domain ProcessingAccepts multiple start URLs to process several domains simultaneously
🏷️Rich Metadata ExtractionRetrieves last modified dates, priority levels, and update frequency where available
🗜️Smart CrawlingWorks with XML sitemaps, recursive crawling, and standard web formats
🛡️Resilient & ReliableAutomatic retries on errors with infinite loop prevention
🎯Result LimitingControl output size via maxResults or use returnAll for full extraction
🔎Keyword URL FilteringReturn only URLs that contain all specified keywords — great for scoped scraping

🎯 Use Cases

Use CaseDescription
SEO AuditExtract all URLs to analyze site architecture and identify orphan pages
Content InventoryBuild a comprehensive page list for migration planning or content audits
Change MonitoringTrack lastmod dates to identify recently updated content
Data PipelinesFeed extracted URLs into downstream scrapers (HTML scraper, Google Sheets, etc.)
Targeted ScrapingUse keyword filters to scope extraction to specific sections (e.g., /blog/, /product/)
Site SamplingUse maxResults to grab a quick URL sample without crawling the full site

💰 Cost of Usage

This actor is intentionally lightweight. It parses URL structures without rendering JavaScript unless strictly necessary, keeping Apify platform unit consumption low.

Site SizeEstimated Cost
Small (< 1,000 URLs)Cents per run
Medium (~10,000 URLs)Typically under $1.00
Large (50,000+ URLs)Scales efficiently; depends on site architecture complexity

💡 Tip: Enable Apify Proxy (on by default) to ensure consistent access and avoid IP-based blocking on larger sites.


📥 Input Configuration

The actor accepts a JSON input object. Only startUrls is required.

Minimal Input

{
"startUrls": [
{ "url": "https://example.com" }
]
}

Full Input Example

{
"startUrls": [
{ "url": "https://apify.com" },
{ "url": "https://crawlee.dev" }
],
"proxyConfiguration": {
"useApifyProxy": true
},
"returnAll": true,
"maxResults": 1000,
"keywords": ["blog", "article"]
}

Input Parameters Reference

ParameterTypeRequiredDefaultDescription
startUrlsArray✅ Yes[{ "url": "https://apify.com" }]One or more website URLs to start extraction from
proxyConfigurationObject❌ No{ "useApifyProxy": false }Proxy settings for reliable, unblocked access
returnAllBoolean❌ NotrueWhen true, extracts all URLs and ignores maxResults
maxResultsInteger❌ No1000Maximum URLs to return. Ignored when returnAll is true or set to 0
keywordsArray❌ No[]Case-insensitive keyword filter — only URLs containing all listed keywords are returned (e.g., ["blog"] → only /blog/... paths)

📤 Output

Results are stored in the Apify dataset and can be exported as JSON, CSV, XML, or Excel directly from the Apify platform.

Each item in the dataset includes the extracted URL and any available metadata such as lastmod, priority, and changefreq.


🛠️ Troubleshooting

No URLs returned?

  • Verify that the startUrls are publicly accessible and not behind authentication.
  • Check if the site uses JavaScript rendering — some SPAs may require additional configuration.

Fewer URLs than expected?

  • The site's sitemap may be incomplete. The actor also performs recursive crawling, but some URLs may be excluded via robots.txt.
  • If returnAll is false, make sure maxResults is set high enough.

Getting blocked?

  • Enable proxyConfiguration with useApifyProxy: true to route requests through Apify's proxy network.

🔗 Build a Complete Website Intelligence Pipeline

This actor is the first step. Once you have your full URL list, power up your analysis with these actors:

ActorWhat it adds
GEO Audit — AI Search Optimization CheckerFeed extracted URLs to score each page's visibility in ChatGPT, Perplexity, and Gemini
Website Tech ProfilerDetect the full technology stack behind any extracted URL — CMS, frameworks, analytics, CDN
Organization Registered Domain & Subdomain ScraperDiscover all domains owned by a target organization before running URL extraction
Website API & Endpoint AnalyzerUncover hidden API endpoints behind any page in your extracted URL list