Pricing

from $10.00 / 1,000 results

Website URL Crawler & Link Extractor

Crawl any website and extract all URLs with full hierarchy — depth, parent URL, and anchor text. Supports static and JavaScript-rendered sites. Configurable depth and domain filtering.

Pricing

from $10.00 / 1,000 results

Rating

0.0

(0)

Developer

Maged

Actor stats

Bookmarked

126

Total users

Monthly active users

a month ago

Last modified

What does Website URL Crawler do?

Start from any URL and this Actor recursively follows links to map the full structure of a website. Each result includes the URL, its depth level, parent URL, and link text â€” giving you a complete picture of how pages connect.

It supports fast HTTP crawling (BeautifulSoup) for static sites and JavaScript rendering (Selenium) for dynamic single-page applications.

Why use this Actor?

Full site mapping â€” crawl unlimited depth to discover every URL
Hierarchy preserved â€” each URL includes its parent URL, depth, and anchor text
Two crawl modes â€” fast HTTP for static sites, JS rendering for React/Vue/Angular apps
Domain filtering â€” optionally restrict crawling to the same domain
Extension filtering â€” skip PDFs, images, ZIPs, and other non-page assets
Duplicate prevention â€” configurable deduplication to keep results clean

How to use Website URL Crawler

Open the Actor and click Try for free
Enter a startUrl
Set maxDepth and maxChildrenPerLink
Run â€” the full URL tree appears in the Output tab
Download as JSON or CSV, or connect via the Apify API

Input

{
  "startUrl": "https://example.com",
  "maxDepth": 3,
  "maxChildrenPerLink": 20,
  "sameDomainOnly": true,
  "useSelenium": false,
  "allowDuplicates": false,
  "ignoredExtensions": ["pdf", "jpg", "png", "zip"]
}

Field	Type	Description	Default
`startUrl`	string	The URL to start crawling from	required
`maxDepth`	integer	Maximum link recursion depth (1â€“30)	`3`
`maxChildrenPerLink`	integer	Max child links per page (1â€“100)	`20`
`sameDomainOnly`	boolean	Only crawl URLs on the same domain	`true`
`useSelenium`	boolean	Use JS rendering for dynamic pages	`false`
`allowDuplicates`	boolean	Allow duplicate URLs in output	`false`
`ignoredExtensions`	array	File extensions to skip	`[]`

Output

[
  {
    "url": "https://example.com",
    "name": null,
    "depth": 0,
    "parentUrl": null
  },
  {
    "url": "https://example.com/about",
    "name": "About Us",
    "depth": 1,
    "parentUrl": "https://example.com"
  },
  {
    "url": "https://example.com/about/team",
    "name": "Our Team",
    "depth": 2,
    "parentUrl": "https://example.com/about"
  }
]

Output data fields

Field	Type	Description
`url`	string	The full URL
`name`	string	Anchor text of the link (if available)
`depth`	number	Depth level from the start URL
`parentUrl`	string	The URL this link was found on

Use cases

Site audits â€” find orphaned pages, broken internal link paths, or redirect chains
SEO analysis â€” map your site architecture to identify crawl depth issues
Sitemap generation â€” build sitemaps for sites that don't have one
Content migration â€” extract all URLs before moving to a new CMS
Competitive research â€” map a competitor's full site structure
QA testing â€” verify all pages are reachable from the homepage

Cost estimation

Site size	Estimated cost
Small site (under 100 pages)	under $0.10
Medium site (1,000 pages)	~$0.50â€“$2.00
Large site (10,000 pages)	~$5â€“$20

Cost scales with the number of URLs crawled and whether JS rendering is enabled.

FAQ

What is the difference between HTTP mode and Selenium mode? HTTP mode (default) is 10x faster and works for most static HTML sites. Selenium mode renders JavaScript â€” use it for React, Vue, and Angular apps.

Can I crawl multiple sites in one run? This Actor starts from a single URL. Trigger multiple runs via the Apify API to crawl several sites in parallel.

Is this Actor maintained? Yes. For bugs or feature requests, open an issue in the Issues tab.

Found this Actor useful?

If this Actor saved you time, please leave a review on the Actor page. Reviews help other users discover it and take 30 seconds — every one genuinely matters.

For bugs, feature requests, or questions, open an issue in the Issues tab above.

Website Social Links Extractor

codescraper/website-social-links-extractor

An advanced actor that extracts official social media links (Facebook, X, LinkedIn, GitHub, etc.) from a list of websites. It uses Playwright to reliably scan modern JavaScript sites (SPAs). Smart logic filters out "junk" links and uses relevancy scoring to find the true company profile.

CodeScraper

168

5.0

Lead Enrichment API Multi-Provider B2B Data Enrichment

alizarin_refrigerator-owner/lead-enricher

Enrich your leads w/company & contact data from 10+ enrichment providers. Perfect for sales prospecting, lead scoring, CRM enrichment, and account-based marketing. 10 Provider Integrations: Apollo, Clearbit, ZoomInfo, IPinfo, FullContact, Hunter, Lusha, Snov, RocketReach & People Data Labs

The Howlers

NPI/NPPES Healthcare Provider Scraper

parseforge/npi-nppes-scraper

Supercharge your healthcare provider research with our NPI/NPPES Scraper! Automate comprehensive data collection from the National Plan and Provider Enumeration System (NPPES) Registry, saving hours of manual research and ensuring you get the most accurate, up-to-date healthcare provider information

ParseForge

5.0

NPPES NPI Registry Scraper & Lookup - Healthcare Provider Leads

pink_comic/nppes-npi-registry

Search official NPPES NPI Registry records by provider name, NPI, taxonomy, specialty, city, state, or ZIP. Export provider, organization, address, phone, taxonomy, and NPI-status evidence for directories, lead research, and credentialing prechecks.

Ava Torres

Sitemap URL Extractor

onescales/sitemap-url-extractor

Provide a website link to a sitemap.xml and the app will extract and list all URLs in the sitemap as well as additional data in the sitemap (i.e. https://onescales.com/sitemap.xml).

One Scales

608

5.0

Website URL Extractor - Get All Site URLs

lofomachines/urls-extractor

Extract every URL from any website automatically — no code needed. This URL extractor crawls pages and parses XML sitemaps into one structured list with metadata (lastmod, priority, changefreq). Filter by keyword, cap results, and export to JSON, CSV, or Excel. Built for SEO audits & migrations.

Lofomachines

194

5.0

Website Social Scraper

burbn/website-social-scraper

Stop searching manually! ✋ Get every social media handle from a list of URLs in seconds. ⚡ Bulk scrape LinkedIn, Instagram, TikTok & Twitter/X with ease. Clean table views for high-quality lead prospecting! 📊

Kevin

149

NPI Provider Contact Finder

labrat011/npi-provider-contact-finder

Find healthcare provider emails and contacts from NPI registry. Generate sales leads with doctor emails, LinkedIn profiles, practice websites. No API key.

mick_

ZocDoc + Healthgrades Doctors & Reviews Scraper

crawlerbros/zocdoc-healthgrades-scraper

Scrape physicians, specialists, ratings, reviews, accepted insurance, locations, and bio data from ZocDoc.com and Healthgrades.com. Single actor with platform switch (zocdoc | healthgrades). No login required.

Crawler Bros

Doctors Email Scraper

contacts-api/doctors-email-scraper

Doctors Email Scraper to collect verified physician emails and contact details by specialty; location; and organization from medical directories and clinic websites 🩺📧 Perfect for healthcare B2B outreach; medtech sales; recruiting; and lead generation.