Pricing

Pay per usage

Site Health Scanner

Crawl a website to detect broken and problematic links, identify redirects and blocked URLs, capture screenshots, and return structured site health data for audits, automation, and monitoring.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Actor stats

Bookmarked

Total users

Monthly active users

3 days ago

Last modified

Site Health Scanner 🔍

Find broken links before Google does. Get screenshots of every broken page as client-ready proof.

⚠️ Bot Protection Notice: Some websites have aggressive bot detection that blocks all automated requests. You can try enabling the Residential Proxy setting to bypass basic protection, but heavily protected sites (Cloudflare, advanced WAFs) may still block scans. Links on these sites will be marked as "BLOCKED" rather than broken—verify them manually in a browser.

What does it do?

Site Health Scanner crawls your website and checks every link—internal pages, external URLs, images, scripts, and stylesheets. When it finds problems, it doesn't just report them—it takes screenshots so you have visual proof for clients or stakeholders.

Perfect for:

SEO professionals auditing client sites
Web agencies delivering health reports
Site owners maintaining link integrity
QA teams catching issues before launch

Features

Feature	Description
🔗 Broken Link Detection	Finds 4xx and 5xx errors across your entire site
🚫 Smart Bot-Block Detection	Distinguishes truly broken links from bot-blocked external sites
📸 Screenshot Proof	Captures screenshots of broken pages automatically
↪️ Redirect Chain Tracking	Maps full redirect paths, catches redirect loops
⚠️ Mixed Content Warnings	Identifies HTTP resources on HTTPS sites
⏱️ Response Time Monitoring	Flags slow-loading pages and resources
🌐 External Link Checking	Optionally validates outbound links

Cost Estimate

Site Size	Pages	Est. Time	Est. Cost
Small blog	50	2-3 min	$0.02-0.05
Business site	200	8-12 min	$0.10-0.20
E-commerce	1,000	30-45 min	$0.50-1.00
Large portal	5,000	2-3 hours	$2.00-4.00

Based on Apify platform pricing. Actual costs vary by page complexity and settings.

Input

Field	Type	Description	Default
`startUrls`	array	URLs to start crawling (homepage, sitemap, or specific pages)	Required
`maxDepth`	integer	How many clicks deep to crawl (0-10)	3
`maxPages`	integer	Maximum pages to crawl (1-10,000)	100
`checkExternalLinks`	boolean	Also check links to other domains	true
`screenshotBrokenPages`	boolean	Take screenshots of 4xx/5xx pages	true
`followRedirects`	boolean	Track full redirect chains	true
`timeout`	integer	Request timeout in seconds (5-120)	30
`includeWarnings`	boolean	Report mixed content, slow pages, etc.	true
`userAgent`	string	Custom user agent (leave empty for default)	""
`useProxy`	boolean	Use residential proxy to bypass bot protection (extra cost)	false
`requestDelay`	integer	Delay between requests in ms (0-10000). 0 = no delay. External links use 3x this value.	0

About Bot Protection (403 Errors)

Many external websites block automated requests and return 403 Forbidden errors. This doesn't mean the link is broken—it just means the site is blocking bots.

Without proxy (default): You may see 403 "BLOCKED" status on external links. These are marked with isBroken: false and confidence: low since the link likely works for real users. Verify manually if needed.

With residential proxy: Enable useProxy to route requests through residential IPs, which are less likely to be blocked. This adds ~$0.02-0.60 per scan depending on size (user pays for proxy traffic).

About Rate Limiting (429 Errors)

Some sites aggressively rate-limit requests. If you see 429 "Too Many Requests" errors, increase the requestDelay setting:

0 (default): No delay - fastest but may trigger rate limits
500-1000: Light throttling - good for most sites
1500-2000: Heavy throttling - for aggressive sites
External links: Automatically use 3x the configured delay

Example Input

{
  "startUrls": [{ "url": "https://example.com" }],
  "maxDepth": 3,
  "maxPages": 500,
  "checkExternalLinks": true,
  "screenshotBrokenPages": true
}

Output

Each checked link produces a record with:

Field	Description
`url`	The URL that was checked
`statusCode`	HTTP status code (200, 404, 500, etc.)
`status`	Category: OK, BROKEN, BLOCKED, REDIRECT, TIMEOUT, ERROR, SERVER_ERROR
`confidence`	How confident we are in the status: high, medium, low
`isBroken`	Definitive broken flag (true only for actually broken links)
`type`	Link type: internal, external
`foundOnPage`	Which page contained this link
`anchorText`	The link's anchor text
`responseTime`	Response time in milliseconds
`redirectChain`	Full redirect path if redirected
`screenshotUrl`	Link to screenshot (for broken pages)
`error`	Error message if request failed
`warning`	Warnings (mixed content, slow, bot protection notice, etc.)
`checkedAt`	When this URL was checked

Status Categories (v1.1.0+)

Status	Meaning	Is Broken?
`OK`	Link works (200-299)	No
`REDIRECT`	Link redirects (300-399)	No
`BROKEN`	Link is dead (404, 410)	Yes
`BLOCKED`	Access denied (401, 403) - often bot protection	No (external) / Yes (internal)
`TIMEOUT`	Request timed out	Yes
`ERROR`	Connection/DNS failed	Yes
`SERVER_ERROR`	Server error (500-599)	Yes
`CLIENT_ERROR`	Other 4xx errors	Yes

Confidence Levels

Confidence	Meaning
`high`	Status is definitive (404, 200, timeout, etc.)
`medium`	Status may vary (timeout could be temporary)
`low`	Status uncertain—external 403s often block bots but work in browsers

Example Output

{
  "url": "https://example.com/old-page",
  "statusCode": 404,
  "status": "BROKEN",
  "confidence": "high",
  "isBroken": true,
  "type": "internal",
  "foundOnPage": "https://example.com/blog",
  "anchorText": "Read our old article",
  "responseTime": 245,
  "redirectChain": null,
  "screenshotUrl": "https://api.apify.com/v2/key-value-stores/abc123/records/screenshot-xyz",
  "error": null,
  "warning": null,
  "checkedAt": "2025-12-13T10:30:00.000Z"
}

External Blocked Link Example

{
  "url": "https://www.viator.com/tours",
  "statusCode": 403,
  "status": "BLOCKED",
  "confidence": "low",
  "isBroken": false,
  "type": "external",
  "foundOnPage": "https://yoursite.com/travel",
  "warning": "External site may be blocking automated requests. Verify manually.",
  "checkedAt": "2025-12-13T10:30:00.000Z"
}

Redirect Chain Example

{
  "url": "https://example.com/page",
  "statusCode": 200,
  "status": "REDIRECT",
  "confidence": "high",
  "isBroken": false,
  "redirectChain": "https://example.com/page → https://example.com/new-page → https://example.com/final-page",
  "warning": "Long redirect chain: 3 hops"
}

Summary Statistics

After each run, a summary is saved to the Key-Value Store under the key summary:

{
  "totalLinksChecked": 847,
  "brokenLinks": 12,
  "blockedLinks": 5,
  "redirects": 45,
  "warnings": 8,
  "pagesProcessed": 100,
  "scanCompletedAt": "2025-12-13T10:45:00.000Z"
}

Log Output (v1.1.0+)

The scanner now provides detailed log output at the end of each run:

═══════════════════════════════════════════
           SCAN COMPLETE
═══════════════════════════════════════════
  Pages crawled:    7
  Links checked:    26
  ❌ Broken links:  0
  🚫 Blocked:       2 (likely bot protection)
  ↪️  Redirects:     11
  ⚠️  Warnings:      1
═══════════════════════════════════════════

📋 WARNING DETAILS:
───────────────────────────────────────────
  • https://yoursite.com/page
    Warning: Slow response: 3500ms
    Found on: https://yoursite.com/
───────────────────────────────────────────

🔍 BLOCKED LINKS (verify manually):
───────────────────────────────────────────
  • https://www.viator.com/tours
    Status: 403 | Found on: https://yoursite.com/travel
  • https://www.britishmuseum.org/collection
    Status: 403 | Found on: https://yoursite.com/museums
───────────────────────────────────────────

How to Use the Results

1. Export to CSV

Download results as CSV for spreadsheet analysis or client reports.

2. Filter by Status

Use Apify's dataset filters to show only broken links or only redirects.

3. Use the `isBroken` Field

Filter by isBroken: true to get only definitely broken links, ignoring bot-blocked external sites.

4. Use Screenshots

Each broken page screenshot is stored in the Key-Value Store. URLs are included in the output for easy access.

5. Automate with Schedules

Set up scheduled runs to monitor site health over time.

6. Integrate via API

const { ApifyClient } = require('apify-client');
const client = new ApifyClient({ token: 'YOUR_TOKEN' });

const run = await client.actor('YOUR_USERNAME/site-health-scanner').call({
    startUrls: [{ url: 'https://example.com' }],
    maxPages: 500
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();

// Get only definitely broken links (ignores bot-blocked externals)
const brokenLinks = items.filter(item => item.isBroken === true);
console.log(`Found ${brokenLinks.length} broken links`);

// Get blocked links that need manual verification
const blockedLinks = items.filter(item => item.status === 'BLOCKED');
console.log(`${blockedLinks.length} links blocked (verify manually)`);

Understanding Status Codes

Code	Status	Meaning	Action Needed
200	OK	Working	None
301	REDIRECT	Permanent redirect	Update link to final URL
302	REDIRECT	Temporary redirect	Usually OK, monitor
400	CLIENT_ERROR	Bad request	Fix malformed URL
401	BLOCKED	Unauthorized	Check if link needs auth
403	BLOCKED	Forbidden	External: likely bot protection, verify manually. Internal: check permissions
404	BROKEN	Not found	Remove or fix link
410	BROKEN	Gone	Remove link
500	SERVER_ERROR	Server error	Contact site owner
503	SERVER_ERROR	Service unavailable	Retry later
0	ERROR	Connection failed	DNS or network issue

Comparison with Other Tools

Feature	Site Health Scanner	Screaming Frog	Ahrefs
Cloud-based	✅	❌ (Desktop)	✅
Screenshots of broken pages	✅	❌	❌
Smart bot-block detection	✅	❌	❌
Confidence scoring	✅	❌	❌
Pay-per-use pricing	✅	License fee	Subscription
API access	✅	Limited	✅
Scheduled runs	✅	Manual	✅
External link checking	✅	✅	✅
Redirect chain tracking	✅	✅	✅

Limitations

JavaScript-rendered content is supported, but very complex SPAs may not extract all links
Some servers block automated requests—try adjusting the userAgent setting
Screenshot capture adds processing time (disable if not needed for faster scans)
External links are checked but not crawled (only status code verified)
Maximum 10,000 pages per run

Use Cases

SEO Audit

Run before and after site migrations to catch broken links that could hurt rankings.

Client Reporting

Use screenshots to show clients exactly what's broken—no technical explanation needed.

Continuous Monitoring

Schedule weekly runs to catch new broken links before they impact SEO or user experience.

Pre-Launch QA

Verify all links work before going live with a new site or major update.

Troubleshooting

Problem	Solution
Timeout errors	Increase the `timeout` setting
Many 403 "blocked" on external sites	This is expected—big sites block bots. Verify manually if needed.
Missing pages	Increase `maxDepth` or `maxPages`
Slow scans	Disable `checkExternalLinks` or `screenshotBrokenPages`
Some links not found	Complex JavaScript navigation may hide links

Changelog

v1.1.0 (December 2025)

IMPROVED: Better status classification
- BLOCKED status for 401/403 (separate from BROKEN)
- External 403s marked as isBroken: false (likely bot protection)
- Internal 403s still marked as isBroken: true
NEW: confidence field (high/medium/low)
NEW: isBroken field for definitive broken detection
NEW: Verbose log output showing warning details, blocked links, and broken links
NEW: blockedLinks count in summary

v1.0.0 (December 2025)

Initial release
Broken link detection with screenshot proof
Redirect chain tracking
Mixed content warnings
External link checking
Response time monitoring

Find broken links before your users do.

Website Audit Orchestrator

constant_quadruped/website-audit-orchestrator

One-click comprehensive website audit. Combines Site Health Scanner (broken links, redirects, errors) with Lighthouse Auditor (performance, accessibility, SEO, best practices). Generates unified HTML report with overall health score and prioritized issues.

Broken Link Checker

parseforge/broken-link-checker

Scan thousands of URLs instantly and detect broken links, 404s, redirects, and slow pages. Get comprehensive link health reports with status codes, response times, redirect chains, and detailed error information. Perfect for website maintenance, SEO audits, and quality assurance.

ParseForge

5.0

Broken Link Checker - Ensure Your Website's Integrity

dainty_screw/find-broken-links-of-your-website

Maintain your website's health and user experience with our Broken Link Checker. Easily identify and fix broken links to enhance your site's navigation, improve SEO, and keep visitors engaged.

codemaster devops

5.0

What Site

maged120/what-site

simple site lookup for title and description of any site

Maged

5.0

Broken Link Checker & Site Auditor

andok/broken-links-checker

Crawl websites to detect 404 broken links and missing resources. Essential for maintaining technical SEO and user experience.

Andok

Website Broken Links & Redirects Checker

smart-digital/website-broken-links-redirects-checker

Analyzes websites to detect broken links (4xx/5xx) and redirects (3xx). Checks internal/external links on single pages or crawls entire sites. Provides detailed reports per page and site summary.

My Smart Digital

5.0

Website Health Checker

literal_jacktree/my-actor-1

Check website health: status codes, SSL certificates, response times, and redirect chains. Perfect for monitoring, SEO audits, and uptime verification.

Janice

Firecrawl Site Mapper

alizarin_refrigerator-owner/firecrawl-site-mapper

Fast URL Discovery for Site Audits & Competitor Analysis Discover all URLs on a website using Firecrawl's Map endpoint. Perfect for competitor analysis, site audits, and content gap discovery.

The Howlers

Lighthouse Campaign Health Actor

om_vineet/lighthouse-campaign-health-actor

Campaign Health Actor automates Lighthouse at scale. Give it one or many URLs and it runs headless Chrome, captures core scores and Web Vitals, and outputs clean, structured data plus campaign-level aggregates—perfect for tracking regressions and monitoring site health.

Vineet Kumar

Broken Link Checker - Website Link Validator & 404 Finder

scrappy_garden/broken-link-checker

Crawl a website (or list of pages) and detect broken links (404/500), unreachable URLs, and invalid asset references. Generates a structured report for SEO audits, QA testing, and website maintenance.

Bikram Adhikari