πŸ”— Broken Link Checker avatar

πŸ”— Broken Link Checker

Pricing

Pay per event

Go to Apify Store
πŸ”— Broken Link Checker

πŸ”— Broken Link Checker

Crawl websites to extract dead URLs, 404 errors, and broken outbound links. Export detailed reports to improve your search rankings and website health.

Pricing

Pay per event

Rating

0.0

(0)

Developer

ε€ͺιƒŽ ε±±η”°

ε€ͺιƒŽ ε±±η”°

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

8 hours ago

Last modified

Share

Crawl websites to find broken links, 404 errors, and dead URLs. Essential for SEO audits, website maintenance, and content teams.

Store Quickstart

Start with the Quickstart template (single starting URL, depth 2). For full-site audits, use Deep Crawl (depth 5, up to 500 pages).

Key Features

  • πŸ•ΈοΈ Configurable crawl depth β€” Follows internal links up to 5 levels deep, up to 500 pages per run
  • 🌐 Internal + external checks β€” Validate both your own links and outbound references
  • πŸ“ Anchor text reporting β€” Identify which link text points to the broken URL
  • 🏷️ Error classification β€” TIMEOUT, DNS_FAILED, CONNECTION_REFUSED, SSL_ERROR
  • ⚑ Concurrent fetching β€” 1-10 parallel requests to speed up crawls
  • πŸ“Š Per-page breakdown β€” Each result shows all broken links grouped by source page

Use Cases

WhoWhy
SEO agenciesRegular broken-link audits for client websites to protect ranking
Content editorsFind dead outbound links in blog posts and documentation
E-commerce sitesMonitor product pages for broken navigation and outbound partner links
Site migrationsValidate internal linking after URL restructuring
Technical SEOIdentify redirect chains and crawl traps that waste crawl budget

Input

FieldTypeDefaultDescription
startUrlsstring[](required)URLs to start crawling (max 10)
maxDepthinteger2Crawl depth (1-5)
maxPagesinteger50Max pages to crawl (1-500)
concurrencyinteger5Parallel requests (1-10)
checkExternalbooleantrueCheck external links
timeoutMsinteger10000Request timeout in ms

Input Example

{
"startUrls": ["https://example.com"],
"maxDepth": 2,
"maxPages": 50,
"concurrency": 5,
"checkExternal": true
}

Output

FieldTypeDescription
urlstringPage URL that was crawled
brokenLinksobject[]Array of broken link objects found on the page
brokenLinks[].hrefstringThe broken link URL
brokenLinks[].anchorstringAnchor text of the link
brokenLinks[].statusCodeintegerHTTP status code returned (404, 500, 0 for network errors)
brokenLinks[].errorstringnull
depthintegerCrawl depth at which this page was discovered
crawledAtstringISO 8601 timestamp

Output Example

{
"url": "https://example.com/blog",
"brokenLinks": [
{
"href": "https://example.com/deleted-page",
"statusCode": 404,
"anchorText": "Old announcement",
"isExternal": false,
"error": null
}
]
}

API Usage

Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console β†’ Settings β†’ Integrations.

cURL

curl -X POST "https://api.apify.com/v2/acts/taroyamada~broken-link-checker/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "startUrls": ["https://example.com"], "maxDepth": 2, "maxPages": 50, "concurrency": 5, "checkExternal": true }'

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("taroyamada/broken-link-checker").call(run_input={
"startUrls": ["https://example.com"],
"maxDepth": 2,
"maxPages": 50,
"concurrency": 5,
"checkExternal": true
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)

JavaScript / Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('taroyamada/broken-link-checker').call({
"startUrls": ["https://example.com"],
"maxDepth": 2,
"maxPages": 50,
"concurrency": 5,
"checkExternal": true
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Tips & Limitations

  • Start with maxDepth: 2 and maxPages: 50 for fast audits before scaling up.
  • Set checkExternal: false to focus only on internal broken links (faster).
  • Combine with URL Health Checker for a full SEO link audit.
  • Run weekly to catch newly broken outbound links in published content.

FAQ

How does crawl depth work?

Depth 1 = only starting URLs. Depth 2 = starting URLs + links found on them. Depth 5 is the maximum and covers most typical sites.

Does it respect robots.txt?

Yes. Pages blocked by robots.txt are skipped during crawl.

Can I exclude certain URL patterns?

Not in the current version. Add URL pattern exclusion to input if needed in future releases.

How long does a 500-page crawl take?

With concurrency=5 and 10s timeout: roughly 5-15 minutes depending on site speed.

Will it crawl pages behind login?

No β€” public pages only. Pages requiring authentication will be skipped.

How does it handle JavaScript-rendered links?

It uses static HTML parsing. SPA links rendered after page load will not be found.

URL/Link Tools cluster β€” explore related Apify tools:

Cost

Pay Per Event:

  • actor-start: $0.01 (flat fee per run)
  • dataset-item: $0.005 per output item

Example: 1,000 items = $0.01 + (1,000 Γ— $0.005) = $5.01

No subscription required β€” you only pay for what you use.