π Broken Link Checker
Pricing
Pay per event
π Broken Link Checker
Crawl websites to extract dead URLs, 404 errors, and broken outbound links. Export detailed reports to improve your search rankings and website health.
Pricing
Pay per event
Rating
0.0
(0)
Developer
ε€ͺι ε±±η°
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
8 hours ago
Last modified
Categories
Share
Crawl websites to find broken links, 404 errors, and dead URLs. Essential for SEO audits, website maintenance, and content teams.
Store Quickstart
Start with the Quickstart template (single starting URL, depth 2). For full-site audits, use Deep Crawl (depth 5, up to 500 pages).
Key Features
- πΈοΈ Configurable crawl depth β Follows internal links up to 5 levels deep, up to 500 pages per run
- π Internal + external checks β Validate both your own links and outbound references
- π Anchor text reporting β Identify which link text points to the broken URL
- π·οΈ Error classification β TIMEOUT, DNS_FAILED, CONNECTION_REFUSED, SSL_ERROR
- β‘ Concurrent fetching β 1-10 parallel requests to speed up crawls
- π Per-page breakdown β Each result shows all broken links grouped by source page
Use Cases
| Who | Why |
|---|---|
| SEO agencies | Regular broken-link audits for client websites to protect ranking |
| Content editors | Find dead outbound links in blog posts and documentation |
| E-commerce sites | Monitor product pages for broken navigation and outbound partner links |
| Site migrations | Validate internal linking after URL restructuring |
| Technical SEO | Identify redirect chains and crawl traps that waste crawl budget |
Input
| Field | Type | Default | Description |
|---|---|---|---|
| startUrls | string[] | (required) | URLs to start crawling (max 10) |
| maxDepth | integer | 2 | Crawl depth (1-5) |
| maxPages | integer | 50 | Max pages to crawl (1-500) |
| concurrency | integer | 5 | Parallel requests (1-10) |
| checkExternal | boolean | true | Check external links |
| timeoutMs | integer | 10000 | Request timeout in ms |
Input Example
{"startUrls": ["https://example.com"],"maxDepth": 2,"maxPages": 50,"concurrency": 5,"checkExternal": true}
Output
| Field | Type | Description |
|---|---|---|
url | string | Page URL that was crawled |
brokenLinks | object[] | Array of broken link objects found on the page |
brokenLinks[].href | string | The broken link URL |
brokenLinks[].anchor | string | Anchor text of the link |
brokenLinks[].statusCode | integer | HTTP status code returned (404, 500, 0 for network errors) |
brokenLinks[].error | string | null |
depth | integer | Crawl depth at which this page was discovered |
crawledAt | string | ISO 8601 timestamp |
Output Example
{"url": "https://example.com/blog","brokenLinks": [{"href": "https://example.com/deleted-page","statusCode": 404,"anchorText": "Old announcement","isExternal": false,"error": null}]}
API Usage
Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console β Settings β Integrations.
cURL
curl -X POST "https://api.apify.com/v2/acts/taroyamada~broken-link-checker/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \-H "Content-Type: application/json" \-d '{ "startUrls": ["https://example.com"], "maxDepth": 2, "maxPages": 50, "concurrency": 5, "checkExternal": true }'
Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("taroyamada/broken-link-checker").call(run_input={"startUrls": ["https://example.com"],"maxDepth": 2,"maxPages": 50,"concurrency": 5,"checkExternal": true})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item)
JavaScript / Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('taroyamada/broken-link-checker').call({"startUrls": ["https://example.com"],"maxDepth": 2,"maxPages": 50,"concurrency": 5,"checkExternal": true});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
Tips & Limitations
- Start with
maxDepth: 2andmaxPages: 50for fast audits before scaling up. - Set
checkExternal: falseto focus only on internal broken links (faster). - Combine with URL Health Checker for a full SEO link audit.
- Run weekly to catch newly broken outbound links in published content.
FAQ
How does crawl depth work?
Depth 1 = only starting URLs. Depth 2 = starting URLs + links found on them. Depth 5 is the maximum and covers most typical sites.
Does it respect robots.txt?
Yes. Pages blocked by robots.txt are skipped during crawl.
Can I exclude certain URL patterns?
Not in the current version. Add URL pattern exclusion to input if needed in future releases.
How long does a 500-page crawl take?
With concurrency=5 and 10s timeout: roughly 5-15 minutes depending on site speed.
Will it crawl pages behind login?
No β public pages only. Pages requiring authentication will be skipped.
How does it handle JavaScript-rendered links?
It uses static HTML parsing. SPA links rendered after page load will not be found.
Related Actors
URL/Link Tools cluster β explore related Apify tools:
- π URL Health Checker β Bulk-check HTTP status codes, redirects, SSL validity, and response times for thousands of URLs.
- π URL Unshortener β Expand bit.
- π·οΈ Meta Tag Analyzer β Analyze meta tags, Open Graph, Twitter Cards, JSON-LD, and hreflang for any URL.
- π Wayback Machine Checker β Check if URLs are archived on the Wayback Machine and find closest snapshots by date.
- Sitemap Analyzer API | sitemap.xml SEO Audit β Analyze sitemap.
- Schema.org Validator API | JSON-LD + Microdata β Validate JSON-LD and Microdata across multiple pages, score markup quality, and flag missing or malformed Schema.
- Site Governance Monitor | Robots, Sitemap & Schema β Recurring robots.
- RDAP Domain Monitor API | Ownership + Expiry β Monitor domain registration data via RDAP and track expiry, registrar, nameserver, and ownership changes in structured rows.
- Domain Security Audit API | SSL Expiry, DMARC, Domain Expiry β Summary-first portfolio monitor for SSL expiry, DMARC/SPF/DKIM, domain expiry/ownership, and security headers with remediation-ready outputs.
Cost
Pay Per Event:
actor-start: $0.01 (flat fee per run)dataset-item: $0.005 per output item
Example: 1,000 items = $0.01 + (1,000 Γ $0.005) = $5.01
No subscription required β you only pay for what you use.