Pricing

from $1.00 / 1,000 results

Find Broken Links

Crawl a website (start URL + same-host pages up to a configurable depth) and report every link that returns a 4xx / 5xx status, times out, or has a DNS error. HTTP-only - no proxy or browser needed.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

Crawler Bros

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

What it does

You give it a start URL; the actor crawls the start page (and optionally same-host internal links up to a depth N), gathers every <a href>, and probes each one with HEAD (falling back to GET when servers reject HEAD). Records are emitted only for links that fail.

The dataset is never empty — even a perfectly-clean site gets a final summary record with run statistics.

Input

Field	Type	Default	Description
`startUrl`	string (required)	`https://apify.com`	Page to start crawling from. Must be `http://` or `https://`.
`maxCrawlDepth`	integer	`1` (0–5)	0 = check links on start URL only; 1+ = follow internal links one level and check theirs too.
`maxPages`	integer	`50` (1–5000)	Hard cap on pages crawled.
`checkExternalLinks`	boolean	`true`	Also probe links that leave the start URL's host.
`verifyWithProxy`	boolean	`true`	When a link returns `401 / 403 / 405 / 429 / 451` (typical anti-bot signals), retry once via Apify residential proxy. If the proxy retry succeeds the link is treated as OK — eliminates false positives from sites that block datacenter IPs (G2, Capterra, etc.). Turn off to skip the retry pass.
`maxConcurrency`	integer	`10` (1–50)	Concurrent HEAD/GET requests during the check phase.
`userAgent`	string (optional)	(Chrome 131)	Override only if a target server filters by UA.

Example input

{
  "startUrl": "https://apify.com",
  "maxCrawlDepth": 1,
  "maxPages": 50,
  "checkExternalLinks": true,
  "maxConcurrency": 10
}

Output

Broken-link record (one per failure)

{
  "url": "https://example.com/old-blog-post",
  "sourcePage": "https://apify.com/blog/index",
  "anchorText": "Read more",
  "linkType": "external",
  "linkDomain": "example.com",
  "isExternalLink": true,
  "httpStatus": 404,
  "errorReason": "not_found",
  "proxyRecheckStatus": 404,
  "scrapedAt": "2024-12-16T14:23:11+00:00"
}

Summary record (always emitted last)

{
  "_recordType": "summary",
  "startUrl": "https://apify.com",
  "pagesCrawled": 12,
  "linksDiscovered": 480,
  "linksChecked": 480,
  "brokenCount": 3,
  "okCount": 477,
  "breakdown": {"not_found": 2, "server_error": 1},
  "maxCrawlDepth": 1,
  "checkExternalLinks": true,
  "scrapedAt": "2024-12-16T14:23:18+00:00"
}

Output fields

url — the broken link's absolute URL.
sourcePage — page where the link was first discovered.
anchorText — visible text of the <a> element (when present).
linkType — "internal" (same host as start URL) or "external".
linkDomain — derived hostname of the broken url (lowercase, includes any port).
isExternalLink — derived boolean: true when the broken link's host differs from sourcePage's host.
httpStatus — HTTP status code (omitted for network errors / timeouts).
errorReason — one of:
- not_found (404), gone (410), forbidden (403), unauthorized (401), server_error (5xx), client_error_<NNN> (other 4xx)
- timeout, dns_error, connection_refused, tls_error, redirect_loop, network_error
proxyRecheckStatus — only present when verifyWithProxy: true triggered a retry. Shows the status returned via residential proxy (use this to distinguish real broken links from anti-bot blocks).
scrapedAt — ISO-8601 timestamp.

Use cases

SEO audits — every broken link costs link equity and damages user trust.
Site migration validation — after a CMS move, find the URLs that didn't get redirected.
Editorial QA — catch dead links in blog content, reference pages, footer navigation.
Internal-tools health — spot broken links to deprecated wikis, retired tools, expired SSO redirects.

FAQ

Does it need a proxy? For the bulk crawl, no — the actor uses curl_cffi with a Chrome User-Agent from a datacenter IP. Optionally, when verifyWithProxy: true (default), any link that returns 401 / 403 / 405 / 429 / 451 is retried once via Apify residential proxy. If that retry succeeds, the link is treated as OK — this eliminates the false positives that used to surface from sites like G2, Capterra, or rate-limited APIs. The retried status is surfaced as proxyRecheckStatus so you can see both checks.

HEAD vs GET — which is used? HEAD first (saves bandwidth). If a server returns 405 or 501, the actor falls back to GET and uses that status instead.

Will it follow redirects? Yes — allow_redirects=True for both HEAD and GET. The final status is what gets recorded.

Can I limit it to internal links only? Set checkExternalLinks: false. The actor still walks the same-host graph for discovery but only probes internal links.

Why is the dataset never empty? Even when no broken links are found, a _recordType: "summary" record is emitted with run stats. This keeps Apify's daily-test happy and gives you a quick health pulse for the site.

My start URL has thousands of pages — will this finish in time? Use maxPages and maxCrawlDepth to keep runs bounded. For large sites, consider running with maxCrawlDepth: 0 first to audit the start page's links, then expand outward.

The summary says brokenCount: 0 but I know some links are dead.

The link may use a non-HTTP scheme (mailto, javascript:, data:) — those aren't checkable.
The link may be JS-rendered (this scraper sees only server-rendered HTML).
The target may serve different content / status to its own site than to a generic crawler — try with the site's own User-Agent via userAgent.

Broken Link Checker

hereditary_model/broken-link-checker

Crawls a website and reports all broken links (4xx/5xx and unreachable), with the pages they appear on. Pay only per page checked.

Aaron Marxsen

Website Broken Links & Redirects Checker

smart-digital/website-broken-links-redirects-checker

Analyzes websites to detect broken links (4xx/5xx) and redirects (3xx). Checks internal/external links on single pages or crawls entire sites. Provides detailed reports per page and site summary.

My Smart Digital

5.0

Dead Link Crawler

actually_good_at_this/dead-link-crawler

Scans any website and identifies broken links (4xx and 5xx status codes). Allows to find and fix broken links, perform SEO audits, identify orphaned pages and server errors, ensure all links work before going live, analyse competitors and discover what's broken on competitor sites.

john Y

Website Image Scraper

crawlerbros/website-image-scraper

Extract every image URL from a website. Crawls the start page (and optionally internal links up to a configurable depth), parses `<img>` tags, `<picture>`/`<source>`, `srcset` candidates, and CSS `background-image` declarations. HTTP-only, no proxy or browser needed.

Crawler Bros

a

tan_asp/danish-grocery-scraper

jens

Bulk URL Status Checker — Broken Links, Redirects & SSL

hipersoft/bulk-url-checker

Check thousands of URLs over plain HTTP: status codes, broken links (404/410/5xx), full redirect chains, response time, content type, page title and SSL certificate expiry. No browser, no login.

hiper soft

Redirect Chain Checker

sootesting/redirect-chain-checker

Trace HTTP redirect chains hop-by-hop — detect redirect loops, HTTP→HTTPS upgrades, chains longer than 4 hops, and chains ending in 4xx/5xx. One clean report per URL. Protects link equity and page speed. Pay-per-event: $0.01 per batch of up to 200 URLs.

soot

Broken Link Checker: 404s & Dead Links w/ Status Codes

eliai/broken-link-checker

Broken link checker: give it a start URL, it crawls the page or whole site and returns every broken internal and external link as JSON â€” source page URL, broken link URL, and HTTP status code. Pay only for pages actually crawled, so cost is bounded before you run.

Anthony Snider

Broken Link Checker

sootesting/broken-link-checker

Find broken links on any page — checks internal and external URLs for dead, timed-out, or SSL-error links and reports HTTP status per link. One report per page with failed URLs listed. Pay-per-event: $0.01 per batch of up to 200 pages.

soot

EU Defense Contract Monitor

northlab/eu-defense-contract-monitor

Tracks EU TED defense procurement award notices (CPV 353xx–357xx) and maps winning contractors to stock tickers. Outputs a structured dataset plus a self-contained HTML viewer.