🔗 Broken Link Checker & Crawler avatar

🔗 Broken Link Checker & Crawler

Pricing

Pay per event

Go to Apify Store
🔗 Broken Link Checker & Crawler

🔗 Broken Link Checker & Crawler

Crawl any website up to five levels deep to extract 404 errors, dead outbound URLs, and precise anchor text details for technical SEO audits.

Pricing

Pay per event

Rating

0.0

(0)

Developer

太郎 山田

太郎 山田

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

1

Monthly active users

2 days ago

Last modified

Share

Broken Link Checker API | Crawl, 404 & Redirect Audit

Part of the Website Health Suite — Comprehensive website trust, compliance, and technical SEO monitoring.

Crawl websites to systematically extract dead URLs, 404 errors, and broken outbound links using this high-performance web scraper. Maintaining a healthy link profile is a foundational pillar of technical SEO and user experience. Search engines like Google actively penalize sites with excessive broken links, making routine link audits indispensable for SEO agencies, digital marketers, and e-commerce managers. This tool automatically crawls your web pages, navigating internal structures up to five levels deep, to pinpoint exactly where user journeys hit dead ends.

Designed for recurring website audits — Schedule weekly or monthly runs to catch broken links before they hurt your rankings. Perfect for post-launch QA during large-scale site migrations, routine content maintenance, or vetting partner websites for link decay. When you run this scraper, it checks both internal navigation elements and external outbound references. The scraped results provide a comprehensive per-page breakdown of your website's link health. For every broken URL discovered, the tool captures vital data including the exact source page where the broken link lives, the anchor text used, and the specific HTTP error classification. Integrate these results directly into your reporting dashboards or QA workflows to keep your site optimized and error-free.

Store Quickstart

Start with the Quickstart template (single starting URL, depth 2). For full-site audits, use Deep Crawl (depth 5, up to 500 pages). For ongoing monitoring, use Weekly Site Health with webhook alerts.

Key Features

  • 🕸️ Configurable crawl depth — Follows internal links up to 5 levels deep, up to 500 pages per run
  • 🌐 Internal + external checks — Validate both your own links and outbound references
  • 📍 Anchor text reporting — Identify which link text points to the broken URL
  • 🏷️ Error classification — TIMEOUT, DNS_FAILED, CONNECTION_REFUSED, SSL_ERROR
  • Concurrent fetching — 1-10 parallel requests to speed up crawls
  • 📊 Per-page breakdown — Each result shows all broken links grouped by source page

Use Cases

WhoWhy
SEO agenciesRegular broken-link audits for client websites to protect ranking
Content editorsFind dead outbound links in blog posts and documentation
E-commerce sitesMonitor product pages for broken navigation and outbound partner links
Site migrationsValidate internal linking after URL restructuring
Technical SEOIdentify redirect chains and crawl traps that waste crawl budget

Input

FieldTypeDefaultDescription
startUrlsstring[](required)URLs to start crawling (max 10)
maxDepthinteger2Crawl depth (1-5)
maxPagesinteger50Max pages to crawl (1-500)
concurrencyinteger5Parallel requests (1-10)
checkExternalbooleantrueCheck external links
timeoutMsinteger10000Request timeout in ms

Input Example

{
"startUrls": ["https://example.com"],
"maxDepth": 2,
"maxPages": 50,
"concurrency": 5,
"checkExternal": true
}

Output

FieldTypeDescription
urlstringPage URL that was crawled
brokenLinksobject[]Array of broken link objects found on the page
brokenLinks[].hrefstringThe broken link URL
brokenLinks[].anchorstringAnchor text of the link
brokenLinks[].statusCodeintegerHTTP status code returned (404, 500, 0 for network errors)
brokenLinks[].errorstringnull
depthintegerCrawl depth at which this page was discovered
crawledAtstringISO 8601 timestamp

Output Example

{
"url": "https://example.com/blog",
"brokenLinks": [
{
"href": "https://example.com/deleted-page",
"statusCode": 404,
"anchorText": "Old announcement",
"isExternal": false,
"error": null
}
]
}

API Usage

Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console → Settings → Integrations.

cURL

curl -X POST "https://api.apify.com/v2/acts/taroyamada~broken-link-checker/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "startUrls": ["https://example.com"], "maxDepth": 2, "maxPages": 50, "concurrency": 5, "checkExternal": true }'

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("taroyamada/broken-link-checker").call(run_input={
"startUrls": ["https://example.com"],
"maxDepth": 2,
"maxPages": 50,
"concurrency": 5,
"checkExternal": true
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)

JavaScript / Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('taroyamada/broken-link-checker').call({
"startUrls": ["https://example.com"],
"maxDepth": 2,
"maxPages": 50,
"concurrency": 5,
"checkExternal": true
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Tips & Limitations

  • Start with maxDepth: 2 and maxPages: 50 for fast audits before scaling up.
  • Set checkExternal: false to focus only on internal broken links (faster).
  • Combine with URL Health Checker for a full SEO link audit.
  • Run weekly to catch newly broken outbound links in published content.

FAQ

How does crawl depth work?

Depth 1 = only starting URLs. Depth 2 = starting URLs + links found on them. Depth 5 is the maximum and covers most typical sites.

Does it respect robots.txt?

Yes. Pages blocked by robots.txt are skipped during crawl.

Can I exclude certain URL patterns?

Not in the current version. Add URL pattern exclusion to input if needed in future releases.

How long does a 500-page crawl take?

With concurrency=5 and 10s timeout: roughly 5-15 minutes depending on site speed.

Will it crawl pages behind login?

No — public pages only. Pages requiring authentication will be skipped.

How does it handle JavaScript-rendered links?

It uses static HTML parsing. SPA links rendered after page load will not be found.

Complete Your Website Health Audit

Website Health Suite — Build a comprehensive compliance and trust monitoring workflow:

1. Link & URL Health (you are here)

2. SEO & Metadata Quality

3. Security & Email Deliverability

4. Historical Data & Recovery

Recommended workflow: Run Broken Link Checker weekly → Fix dead links → Validate with URL Health Checker → Monitor metadata with Meta Tag Analyzer → Schedule recurring compliance audits.

Other Website Tools:

Cost

Pay Per Event:

  • actor-start: $0.01 (flat fee per run)
  • dataset-item: $0.005 per output item

Example: 1,000 items = $0.01 + (1,000 × $0.005) = $5.01

No subscription required — you only pay for what you use.

⭐ Was this helpful?

If this actor saved you time, please leave a ★ rating on Apify Store. It takes 10 seconds, helps other developers discover it, and keeps updates free.

Bug report or feature request? Open an issue on the Issues tab of this actor.