🔗 Broken Link Checker & Crawler
Pricing
Pay per event
🔗 Broken Link Checker & Crawler
Crawl any website up to five levels deep to extract 404 errors, dead outbound URLs, and precise anchor text details for technical SEO audits.
Broken Link Checker API | Crawl, 404 & Redirect Audit
Part of the Website Health Suite — Comprehensive website trust, compliance, and technical SEO monitoring.
Crawl websites to systematically extract dead URLs, 404 errors, and broken outbound links using this high-performance web scraper. Maintaining a healthy link profile is a foundational pillar of technical SEO and user experience. Search engines like Google actively penalize sites with excessive broken links, making routine link audits indispensable for SEO agencies, digital marketers, and e-commerce managers. This tool automatically crawls your web pages, navigating internal structures up to five levels deep, to pinpoint exactly where user journeys hit dead ends.
Designed for recurring website audits — Schedule weekly or monthly runs to catch broken links before they hurt your rankings. Perfect for post-launch QA during large-scale site migrations, routine content maintenance, or vetting partner websites for link decay. When you run this scraper, it checks both internal navigation elements and external outbound references. The scraped results provide a comprehensive per-page breakdown of your website's link health. For every broken URL discovered, the tool captures vital data including the exact source page where the broken link lives, the anchor text used, and the specific HTTP error classification. Integrate these results directly into your reporting dashboards or QA workflows to keep your site optimized and error-free.
Store Quickstart
Start with the Quickstart template (single starting URL, depth 2). For full-site audits, use Deep Crawl (depth 5, up to 500 pages). For ongoing monitoring, use Weekly Site Health with webhook alerts.
Key Features
- 🕸️ Configurable crawl depth — Follows internal links up to 5 levels deep, up to 500 pages per run
- 🌐 Internal + external checks — Validate both your own links and outbound references
- 📍 Anchor text reporting — Identify which link text points to the broken URL
- 🏷️ Error classification — TIMEOUT, DNS_FAILED, CONNECTION_REFUSED, SSL_ERROR
- ⚡ Concurrent fetching — 1-10 parallel requests to speed up crawls
- 📊 Per-page breakdown — Each result shows all broken links grouped by source page
Use Cases
| Who | Why |
|---|---|
| SEO agencies | Regular broken-link audits for client websites to protect ranking |
| Content editors | Find dead outbound links in blog posts and documentation |
| E-commerce sites | Monitor product pages for broken navigation and outbound partner links |
| Site migrations | Validate internal linking after URL restructuring |
| Technical SEO | Identify redirect chains and crawl traps that waste crawl budget |
Input
| Field | Type | Default | Description |
|---|---|---|---|
| startUrls | string[] | (required) | URLs to start crawling (max 10) |
| maxDepth | integer | 2 | Crawl depth (1-5) |
| maxPages | integer | 50 | Max pages to crawl (1-500) |
| concurrency | integer | 5 | Parallel requests (1-10) |
| checkExternal | boolean | true | Check external links |
| timeoutMs | integer | 10000 | Request timeout in ms |
Input Example
{"startUrls": ["https://example.com"],"maxDepth": 2,"maxPages": 50,"concurrency": 5,"checkExternal": true}
Output
| Field | Type | Description |
|---|---|---|
url | string | Page URL that was crawled |
brokenLinks | object[] | Array of broken link objects found on the page |
brokenLinks[].href | string | The broken link URL |
brokenLinks[].anchor | string | Anchor text of the link |
brokenLinks[].statusCode | integer | HTTP status code returned (404, 500, 0 for network errors) |
brokenLinks[].error | string | null |
depth | integer | Crawl depth at which this page was discovered |
crawledAt | string | ISO 8601 timestamp |
Output Example
{"url": "https://example.com/blog","brokenLinks": [{"href": "https://example.com/deleted-page","statusCode": 404,"anchorText": "Old announcement","isExternal": false,"error": null}]}
API Usage
Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console → Settings → Integrations.
cURL
curl -X POST "https://api.apify.com/v2/acts/taroyamada~broken-link-checker/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \-H "Content-Type: application/json" \-d '{ "startUrls": ["https://example.com"], "maxDepth": 2, "maxPages": 50, "concurrency": 5, "checkExternal": true }'
Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("taroyamada/broken-link-checker").call(run_input={"startUrls": ["https://example.com"],"maxDepth": 2,"maxPages": 50,"concurrency": 5,"checkExternal": true})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item)
JavaScript / Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('taroyamada/broken-link-checker').call({"startUrls": ["https://example.com"],"maxDepth": 2,"maxPages": 50,"concurrency": 5,"checkExternal": true});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
Tips & Limitations
- Start with
maxDepth: 2andmaxPages: 50for fast audits before scaling up. - Set
checkExternal: falseto focus only on internal broken links (faster). - Combine with URL Health Checker for a full SEO link audit.
- Run weekly to catch newly broken outbound links in published content.
FAQ
How does crawl depth work?
Depth 1 = only starting URLs. Depth 2 = starting URLs + links found on them. Depth 5 is the maximum and covers most typical sites.
Does it respect robots.txt?
Yes. Pages blocked by robots.txt are skipped during crawl.
Can I exclude certain URL patterns?
Not in the current version. Add URL pattern exclusion to input if needed in future releases.
How long does a 500-page crawl take?
With concurrency=5 and 10s timeout: roughly 5-15 minutes depending on site speed.
Will it crawl pages behind login?
No — public pages only. Pages requiring authentication will be skipped.
How does it handle JavaScript-rendered links?
It uses static HTML parsing. SPA links rendered after page load will not be found.
Complete Your Website Health Audit
Website Health Suite — Build a comprehensive compliance and trust monitoring workflow:
1. Link & URL Health (you are here)
- 🔗 Broken Link Checker — Find broken links across your entire site structure
- 🔗 Bulk URL Health Checker — Validate HTTP status, redirects, SSL, and response times for URL lists
2. SEO & Metadata Quality
- 🏷️ Meta Tag Analyzer — Audit title tags, Open Graph, Twitter Cards, and hreflang
- Schema.org Validator — Validate JSON-LD and Microdata with quality scoring
3. Security & Email Deliverability
- DNS/DMARC Security Checker — Audit SPF, DKIM, DMARC, and MX records
4. Historical Data & Recovery
- 📚 Wayback Machine Checker — Find archived snapshots for content recovery
Recommended workflow: Run Broken Link Checker weekly → Fix dead links → Validate with URL Health Checker → Monitor metadata with Meta Tag Analyzer → Schedule recurring compliance audits.
Other Website Tools:
- Sitemap Analyzer — SEO sitemap audit
- Site Governance Monitor — Robots.txt and schema monitoring
- Domain Trust Monitor — SSL expiry and security headers
Cost
Pay Per Event:
actor-start: $0.01 (flat fee per run)dataset-item: $0.005 per output item
Example: 1,000 items = $0.01 + (1,000 × $0.005) = $5.01
No subscription required — you only pay for what you use.
⭐ Was this helpful?
If this actor saved you time, please leave a ★ rating on Apify Store. It takes 10 seconds, helps other developers discover it, and keeps updates free.
Bug report or feature request? Open an issue on the Issues tab of this actor.