Broken Link Checker — Find 404s, Dead Links & Redirect Issues avatar

Broken Link Checker — Find 404s, Dead Links & Redirect Issues

Pricing

from $1.00 / 1,000 link checkeds

Go to Apify Store
Broken Link Checker — Find 404s, Dead Links & Redirect Issues

Broken Link Checker — Find 404s, Dead Links & Redirect Issues

Crawl a website, scan a URL list, or verify all URLs from a sitemap. Returns broken links with source page, anchor text, status, redirect chain, and failure class — for SEO audits, content QA, and migration validation.

Pricing

from $1.00 / 1,000 link checkeds

Rating

0.0

(0)

Developer

Khadin Akbar

Khadin Akbar

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

2

Monthly active users

2 days ago

Last modified

Share

Broken Link Checker — Find 404s, Dead Links, Redirects & Slow Pages

Scan a whole website, a list of URLs, or every URL in a sitemap.xml and find every broken link — 404s, server errors, timeouts, SSL/DNS failures, redirect chains, and slow pages. Each broken link comes with the source page that linked to it, the anchor text, the HTTP status, the redirect chain, and a failure class like broken, redirect_loop, timeout, dns_error, or ssl_error. Built for SEO audits, content QA, site migrations, and AI-agent link health checks.

Try it now — paste a website URL and get a CSV of every broken link in minutes.

Broken Link Checker is a fast, configurable link auditor that runs on the Apify platform. Three modes cover every link-checking workflow:

  • Crawl mode — start at a URL, crawl up to N pages on the same domain, extract every link (<a>, <img>, <script>, <link>, <iframe> if enabled), and verify each one.
  • List mode — paste up to 5,000 URLs and get HTTP status + classification for each. No crawling.
  • Sitemap mode — point at a sitemap.xml (sitemap-index files supported) and verify every URL inside.

For every URL the actor tries HEAD first (fast, ~1 KB), then falls back to GET if the server returns an ambiguous code (405, 403, 501, etc.) — exactly how a real browser would. Redirects are tracked hop-by-hop so you can see the full chain. Output ships as a structured dataset you can download as CSV / JSON / Excel, query via the Apify API, schedule, or call from Make / Zapier / your own code.

  • Fix SEO-killing 404s — broken internal links bleed PageRank and frustrate users; broken outbound links damage E-E-A-T signals.
  • Validate site migrations — confirm every old URL 301s correctly to its new home; flag redirect chains that lose link equity.
  • Audit content at scale — agencies running monthly link-rot reports, enterprises auditing 10,000+ blog posts.
  • Pre-launch QA — catch dead links in marketing pages, docs, and email templates before they go live.
  • AI agents — perfect MCP tool for Claude / GPT agents auditing site quality. Predictable cost (~$0.001 per URL), structured JSON output, no setup required.
  1. Open the actor on Apify Console and click Try for free.
  2. Pick an input mode:
    • crawl → paste a website URL into Website URL and set Max pages.
    • list → paste your URLs into URLs to verify.
    • sitemap → paste your sitemap.xml URL into Website URL.
  3. (Optional) Tune Max links to verify, toggle Check external links, and enable Check images, scripts, and stylesheets for a full content audit.
  4. Click Save & Start. The actor reports progress in real-time.
  5. When finished, open the Output tab → download as CSV, JSON, or Excel, or pull via the Apify API.

Input

FieldTypeDefaultDescription
modestringcrawlOne of crawl, list, sitemap.
startUrlstringRequired for crawl and sitemap. The site to crawl or the sitemap.xml URL.
urlsarrayRequired for list. Up to 5,000 URLs.
maxPagesint50Max internal pages to crawl (crawl mode only).
maxLinksToCheckint500Hard cap on links verified per run.
checkExternalLinksbooltrueVerify links pointing to other domains.
checkAssetsboolfalseAlso check <img>, <script>, <link>, <iframe> URLs.
onlyReportBrokenbooltrueDataset contains only failing links. Set to false for a full inventory.
slowThresholdMsint5000Above this, link is classified slow.
requestTimeoutMsint15000Per-request timeout in milliseconds.
maxConcurrencyint20Parallel verification requests.
maxRedirectsint10Max hops before classifying as redirect_loop.
userAgentstringApifyBrokenLinkChecker UACustom User-Agent for all requests.
proxyConfigurationobjectApify datacenterProxy settings; switch to residential only if blocked.

Output

Each broken link becomes a structured record in the dataset.

{
"url": "https://example.com/missing-page",
"finalUrl": "https://example.com/missing-page",
"status": 404,
"statusText": "Not Found",
"classification": "broken",
"isBroken": true,
"sourceUrl": "https://example.com/blog/post-with-broken-link",
"anchorText": "see our pricing",
"linkType": "a",
"allSources": [
{ "sourceUrl": "https://example.com/blog/post-with-broken-link", "anchorText": "see our pricing", "linkType": "a" }
],
"method": "GET",
"hops": 0,
"redirectChain": [],
"durationMs": 234,
"error": null,
"checkedAt": "2026-05-03T19:55:00.000Z"
}

The final record is a _summary: true row with totals and a per-class breakdown.

You can download the dataset in JSON, CSV, Excel, HTML, RSS, or XML, or pull it via API.

Data table

FieldDescription
urlNormalized URL that was checked.
finalUrlURL after following redirects.
statusHTTP status of the final response. null if no response.
statusTextHTTP status reason phrase.
classificationOne of ok, slow, redirect_chain, redirect_loop, broken, timeout, dns_error, ssl_error, connection_refused, blocked, error.
isBrokentrue for any failing class — easy filter for actionable rows.
sourceUrlFirst page on the crawled site that linked to this URL.
anchorTextVisible link text (or alt/title for assets), trimmed and capped at 200 chars.
linkTypeHow it was referenced: a, img, script, link, iframe, sitemap, list.
allSourcesUp to 5 source pages that link to this URL.
methodHTTP method used for the final response (HEAD or GET).
hopsRedirect hops followed.
redirectChainArray of { from, to, status } per hop.
durationMsTotal verification time.
error{ code, message } when verification failed.
checkedAtISO 8601 timestamp.

Broken Link Checker uses pay-per-event pricing — only pay for what you actually verify.

EventPrice
Actor start$0.00005
Link checked$0.001 per URL verified

Typical costs:

  • Small site crawl (50 pages, 200 links) → **$0.20**
  • Bulk URL list (1,000 URLs) → ~$1.00
  • Full sitemap (5,000 URLs) → ~$5.00

There are no monthly fees, no setup fees, and runs that fail before discovering links cost only the negligible $0.00005 start fee.

Tips & advanced options

  • Cut cost in half by setting checkExternalLinks: false — most sites have more outbound links than internal pages.
  • Skip asset checks (checkAssets: false) for hyperlink-only audits — typical sites have 5-10× more assets than <a> links.
  • Use mode: 'list' for outreach link audits — paste your placement URLs directly without crawling.
  • Use mode: 'sitemap' for full URL coverage on sites with deep navigation that's hard to crawl.
  • For rate-limited targets, drop maxConcurrency to 5-10 and increase requestTimeoutMs.
  • For fragile sites that block bots, switch proxyConfiguration.apifyProxyGroups to ["RESIDENTIAL"] and customize userAgent to a real browser string.
  • Combine with other Apify tools: pair with Bulk Website Contact Extractor to find broken backlinks AND contact owners, or feed sitemaps from Sitemap URL Extractor.

On Apify Console, open the Schedules tab on the actor and add a cron expression — e.g. 0 6 * * 1 to run every Monday at 6 AM. Combine with the Apify webhook integration to push results into Slack, Google Sheets, or your own monitoring stack.

FAQ

Does it follow robots.txt? The actor verifies links by issuing a single HEAD/GET per URL — the same load a normal browser visit would create. For polite crawling, the crawl mode respects robots.txt via Crawlee's defaults; you can override per request if needed.

Can it check authenticated pages? Not yet. Add an issue if you'd like cookie-based authentication added.

What counts as a "broken" link? The isBroken flag is true for these classes: broken (4xx/5xx, except 403/429), timeout, dns_error, ssl_error, connection_refused, redirect_loop, and error. The classes slow, redirect_chain, and blocked (403/429) are reported but not flagged broken — they're worth reviewing but may be intentional.

Why HEAD first? HEAD requests are 5-10× cheaper than GET for the target server and faster for you. Some servers refuse HEAD with 405/403/501 — when that happens we automatically retry with GET, so you still get an accurate result.

Disclaimer

Broken Link Checker only verifies HTTP responses to URLs. It does not download page content beyond crawl mode (which fetches HTML to extract links). Use this actor responsibly: respect target site Terms of Service, set sensible concurrency for small sites, and do not use the tool to probe systems you do not own or have permission to audit. The actor reports raw HTTP signals — interpretation (whether a 403 is intentional access control or a broken link) is up to you.

Support and feedback

Found a bug or want a feature? Open an issue on the actor's Issues tab on Apify Console.