Website URL Crawler & Link Extractor avatar

Website URL Crawler & Link Extractor

Pricing

$18.00/month + usage

Go to Apify Store
Website URL Crawler & Link Extractor

Website URL Crawler & Link Extractor

Crawl any website and extract all URLs with full hierarchy — depth, parent URL, and anchor text. Supports static and JavaScript-rendered sites. Configurable depth and domain filtering.

Pricing

$18.00/month + usage

Rating

0.0

(0)

Developer

Maged

Maged

Maintained by Community

Actor stats

0

Bookmarked

42

Total users

2

Monthly active users

4 days ago

Last modified

Share

Crawl any website and extract all URLs in a hierarchical structure. Visualize your site architecture, audit internal links, discover broken paths, and build complete sitemaps — with configurable depth, domain filtering, and proxy support.

What does Website URL Crawler do?

Start from any URL and this Actor recursively follows links to map the full structure of a website. Each result includes the URL, its depth level, parent URL, and link text — giving you a complete picture of how pages connect.

It supports fast HTTP crawling (BeautifulSoup) for static sites and JavaScript rendering (Selenium) for dynamic single-page applications.

Why use this Actor?

  • Full site mapping — crawl unlimited depth to discover every URL
  • Hierarchy preserved — each URL includes its parent URL, depth, and anchor text
  • Two crawl modes — fast HTTP for static sites, JS rendering for React/Vue/Angular apps
  • Domain filtering — optionally restrict crawling to the same domain
  • Extension filtering — skip PDFs, images, ZIPs, and other non-page assets
  • Duplicate prevention — configurable deduplication to keep results clean

How to use Website URL Crawler

  1. Open the Actor and click Try for free
  2. Enter a startUrl
  3. Set maxDepth and maxChildrenPerLink
  4. Run — the full URL tree appears in the Output tab
  5. Download as JSON or CSV, or connect via the Apify API

Input

{
"startUrl": "https://example.com",
"maxDepth": 3,
"maxChildrenPerLink": 20,
"sameDomainOnly": true,
"useSelenium": false,
"allowDuplicates": false,
"ignoredExtensions": ["pdf", "jpg", "png", "zip"]
}
FieldTypeDescriptionDefault
startUrlstringThe URL to start crawling fromrequired
maxDepthintegerMaximum link recursion depth (1–30)3
maxChildrenPerLinkintegerMax child links per page (1–100)20
sameDomainOnlybooleanOnly crawl URLs on the same domaintrue
useSeleniumbooleanUse JS rendering for dynamic pagesfalse
allowDuplicatesbooleanAllow duplicate URLs in outputfalse
ignoredExtensionsarrayFile extensions to skip[]

Output

[
{
"url": "https://example.com",
"name": null,
"depth": 0,
"parentUrl": null
},
{
"url": "https://example.com/about",
"name": "About Us",
"depth": 1,
"parentUrl": "https://example.com"
},
{
"url": "https://example.com/about/team",
"name": "Our Team",
"depth": 2,
"parentUrl": "https://example.com/about"
}
]

Output data fields

FieldTypeDescription
urlstringThe full URL
namestringAnchor text of the link (if available)
depthnumberDepth level from the start URL
parentUrlstringThe URL this link was found on

Use cases

  • Site audits — find orphaned pages, broken internal link paths, or redirect chains
  • SEO analysis — map your site architecture to identify crawl depth issues
  • Sitemap generation — build sitemaps for sites that don't have one
  • Content migration — extract all URLs before moving to a new CMS
  • Competitive research — map a competitor's full site structure
  • QA testing — verify all pages are reachable from the homepage

Cost estimation

Site sizeEstimated cost
Small site (under 100 pages)under $0.10
Medium site (1,000 pages)~$0.50–$2.00
Large site (10,000 pages)~$5–$20

Cost scales with the number of URLs crawled and whether JS rendering is enabled.

FAQ

What is the difference between HTTP mode and Selenium mode? HTTP mode (default) is 10x faster and works for most static HTML sites. Selenium mode renders JavaScript — use it for React, Vue, and Angular apps.

Can I crawl multiple sites in one run? This Actor starts from a single URL. Trigger multiple runs via the Apify API to crawl several sites in parallel.

Is this Actor maintained? Yes. For bugs or feature requests, open an issue in the Issues tab.

Found this Actor useful?

If this Actor saved you time, please leave a review on the Actor page. Reviews help other users discover it and take 30 seconds — every one genuinely matters.

For bugs, feature requests, or questions, open an issue in the Issues tab above.