Website URL Crawler & Link Extractor
Pricing
$18.00/month + usage
Website URL Crawler & Link Extractor
Crawl any website and extract all URLs with full hierarchy — depth, parent URL, and anchor text. Supports static and JavaScript-rendered sites. Configurable depth and domain filtering.
Pricing
$18.00/month + usage
Rating
0.0
(0)
Developer
Maged
Maintained by CommunityActor stats
0
Bookmarked
42
Total users
2
Monthly active users
4 days ago
Last modified
Categories
Share
Crawl any website and extract all URLs in a hierarchical structure. Visualize your site architecture, audit internal links, discover broken paths, and build complete sitemaps — with configurable depth, domain filtering, and proxy support.
What does Website URL Crawler do?
Start from any URL and this Actor recursively follows links to map the full structure of a website. Each result includes the URL, its depth level, parent URL, and link text — giving you a complete picture of how pages connect.
It supports fast HTTP crawling (BeautifulSoup) for static sites and JavaScript rendering (Selenium) for dynamic single-page applications.
Why use this Actor?
- Full site mapping — crawl unlimited depth to discover every URL
- Hierarchy preserved — each URL includes its parent URL, depth, and anchor text
- Two crawl modes — fast HTTP for static sites, JS rendering for React/Vue/Angular apps
- Domain filtering — optionally restrict crawling to the same domain
- Extension filtering — skip PDFs, images, ZIPs, and other non-page assets
- Duplicate prevention — configurable deduplication to keep results clean
How to use Website URL Crawler
- Open the Actor and click Try for free
- Enter a
startUrl - Set
maxDepthandmaxChildrenPerLink - Run — the full URL tree appears in the Output tab
- Download as JSON or CSV, or connect via the Apify API
Input
{"startUrl": "https://example.com","maxDepth": 3,"maxChildrenPerLink": 20,"sameDomainOnly": true,"useSelenium": false,"allowDuplicates": false,"ignoredExtensions": ["pdf", "jpg", "png", "zip"]}
| Field | Type | Description | Default |
|---|---|---|---|
startUrl | string | The URL to start crawling from | required |
maxDepth | integer | Maximum link recursion depth (1–30) | 3 |
maxChildrenPerLink | integer | Max child links per page (1–100) | 20 |
sameDomainOnly | boolean | Only crawl URLs on the same domain | true |
useSelenium | boolean | Use JS rendering for dynamic pages | false |
allowDuplicates | boolean | Allow duplicate URLs in output | false |
ignoredExtensions | array | File extensions to skip | [] |
Output
[{"url": "https://example.com","name": null,"depth": 0,"parentUrl": null},{"url": "https://example.com/about","name": "About Us","depth": 1,"parentUrl": "https://example.com"},{"url": "https://example.com/about/team","name": "Our Team","depth": 2,"parentUrl": "https://example.com/about"}]
Output data fields
| Field | Type | Description |
|---|---|---|
url | string | The full URL |
name | string | Anchor text of the link (if available) |
depth | number | Depth level from the start URL |
parentUrl | string | The URL this link was found on |
Use cases
- Site audits — find orphaned pages, broken internal link paths, or redirect chains
- SEO analysis — map your site architecture to identify crawl depth issues
- Sitemap generation — build sitemaps for sites that don't have one
- Content migration — extract all URLs before moving to a new CMS
- Competitive research — map a competitor's full site structure
- QA testing — verify all pages are reachable from the homepage
Cost estimation
| Site size | Estimated cost |
|---|---|
| Small site (under 100 pages) | under $0.10 |
| Medium site (1,000 pages) | ~$0.50–$2.00 |
| Large site (10,000 pages) | ~$5–$20 |
Cost scales with the number of URLs crawled and whether JS rendering is enabled.
FAQ
What is the difference between HTTP mode and Selenium mode? HTTP mode (default) is 10x faster and works for most static HTML sites. Selenium mode renders JavaScript — use it for React, Vue, and Angular apps.
Can I crawl multiple sites in one run? This Actor starts from a single URL. Trigger multiple runs via the Apify API to crawl several sites in parallel.
Is this Actor maintained? Yes. For bugs or feature requests, open an issue in the Issues tab.
Found this Actor useful?
If this Actor saved you time, please leave a review on the Actor page. Reviews help other users discover it and take 30 seconds — every one genuinely matters.
For bugs, feature requests, or questions, open an issue in the Issues tab above.


