Broken Link Checker & Site Auditor avatar

Broken Link Checker & Site Auditor

Pricing

$1.00 / 1,000 page crawleds

Go to Apify Store
Broken Link Checker & Site Auditor

Broken Link Checker & Site Auditor

Crawl websites to detect 404 broken links and missing resources. Essential for maintaining technical SEO and user experience.

Pricing

$1.00 / 1,000 page crawleds

Rating

0.0

(0)

Developer

Andok

Andok

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

18 days ago

Last modified

Share

Broken Links Checker

Find every broken link on your website before visitors and search engines do. Broken links tank user experience and leak link equity during site migrations — this actor crawls your pages, checks every link (internal and external), and reports exactly which URLs return 404, 5xx, or timeout errors. Configure crawl depth, page limits, and concurrency to match sites of any size.

Features

  • Deep crawling — follows internal links up to a configurable depth to discover broken links across your entire site
  • Internal and external — optionally checks outbound links to third-party sites, not just internal pages
  • Smart link checking — uses HEAD requests first, falls back to GET when servers block HEAD
  • Per-page reporting — groups broken links by the page they appear on for easy triage
  • Redirect chain tracking — captures the full redirect chain for each broken link
  • Same-domain filtering — restrict crawling to the start URL's domain or allow cross-domain discovery
  • Configurable limits — set max pages, max depth, and max links per page to control scope and cost

Input

FieldTypeRequiredDefaultDescription
startUrlsarrayYesPages to begin crawling from (request list format)
maxPagesintegerNo100Maximum number of pages to crawl
maxDepthintegerNo3How many link levels deep to follow from the start URLs
sameDomainOnlybooleanNotrueOnly crawl pages on the same domain as the start URL
checkExternalLinksbooleanNofalseAlso check outbound links to external domains
maxLinksPerPageintegerNo200Maximum number of links to check on each crawled page
timeoutSecondsintegerNo20HTTP timeout in seconds for each link check
concurrencyintegerNo5Number of pages to process in parallel
userAgentstringNoMozilla/5.0 (compatible; ApifyActor/1.0; +https://apify.com)Custom User-Agent header for HTTP requests

Input Example

{
"startUrls": [{ "url": "https://crawlee.dev" }],
"maxPages": 200,
"maxDepth": 3,
"checkExternalLinks": true,
"sameDomainOnly": true
}

Output

Each crawled page produces one dataset item listing all broken links found on that page.

  • pageUrl (string) — the page that was crawled
  • depth (number) — how many links deep from the start URL
  • pageStatus (number | null) — HTTP status of the crawled page itself
  • brokenCount (number) — total broken links found on this page
  • brokenLinks (array) — list of broken links with url, status, finalUrl, redirectChain, and error
  • checkedAt (string) — ISO timestamp
  • error (string | null) — error if the page itself could not be fetched

Output Example

{
"pageUrl": "https://crawlee.dev/docs/introduction",
"depth": 1,
"pageStatus": 200,
"brokenCount": 2,
"brokenLinks": [
{
"url": "https://crawlee.dev/docs/old-guide",
"status": 404,
"finalUrl": "https://crawlee.dev/docs/old-guide",
"redirectChain": [],
"error": null
},
{
"url": "https://external-site.com/removed-page",
"status": null,
"finalUrl": null,
"redirectChain": [],
"error": "ECONNREFUSED"
}
],
"checkedAt": "2025-11-20T14:30:00.000Z",
"error": null
}

Pricing

EventCost
Page Crawled$0.001 per page

You are charged per page crawled. Platform usage fees apply separately.

Use Cases

  • Pre-migration audits — scan your entire site before a domain or CMS migration to establish a baseline
  • Post-migration validation — verify no links broke after launching on a new domain or URL structure
  • SEO health checks — find and fix broken internal links that waste crawl budget and leak link equity
  • Client reporting — agencies can schedule weekly runs to catch broken links before clients notice
  • External link monitoring — detect when third-party resources you link to go offline
ActorWhat it adds
Redirect Chain AnalyzerTrace the full redirect chain for URLs to diagnose redirect loops and chains
Hreflang CheckerValidate hreflang tags on multilingual pages found during crawling
XML Sitemap URL ExtractorExtract all URLs from your sitemap to use as start URLs for a comprehensive crawl