Site Health Scanner avatar
Site Health Scanner

Pricing

Pay per usage

Go to Apify Store
Site Health Scanner

Site Health Scanner

Crawl a website to detect broken and problematic links, identify redirects and blocked URLs, capture screenshots, and return structured site health data for audits, automation, and monitoring.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Quadruped

Quadruped

Maintained by Community

Actor stats

0

Bookmarked

4

Total users

2

Monthly active users

14 days ago

Last modified

Share

Site Health Scanner πŸ”

Find broken links before Google does. Get screenshots of every broken page as client-ready proof.

⚠️ Bot Protection Notice: Some websites have aggressive bot detection that blocks all automated requests. You can try enabling the Residential Proxy setting to bypass basic protection, but heavily protected sites (Cloudflare, advanced WAFs) may still block scans. Links on these sites will be marked as "BLOCKED" rather than brokenβ€”verify them manually in a browser.

What does it do?

Site Health Scanner crawls your website and checks every linkβ€”internal pages, external URLs, images, scripts, and stylesheets. When it finds problems, it doesn't just report themβ€”it takes screenshots so you have visual proof for clients or stakeholders.

Perfect for:

  • SEO professionals auditing client sites
  • Web agencies delivering health reports
  • Site owners maintaining link integrity
  • QA teams catching issues before launch

Features

FeatureDescription
πŸ”— Broken Link DetectionFinds 4xx and 5xx errors across your entire site
🚫 Smart Bot-Block DetectionDistinguishes truly broken links from bot-blocked external sites
πŸ“Έ Screenshot ProofCaptures screenshots of broken pages automatically
β†ͺ️ Redirect Chain TrackingMaps full redirect paths, catches redirect loops
⚠️ Mixed Content WarningsIdentifies HTTP resources on HTTPS sites
⏱️ Response Time MonitoringFlags slow-loading pages and resources
🌐 External Link CheckingOptionally validates outbound links

Cost Estimate

Site SizePagesEst. TimeEst. Cost
Small blog502-3 min$0.02-0.05
Business site2008-12 min$0.10-0.20
E-commerce1,00030-45 min$0.50-1.00
Large portal5,0002-3 hours$2.00-4.00

Based on Apify platform pricing. Actual costs vary by page complexity and settings.

Input

FieldTypeDescriptionDefault
startUrlsarrayURLs to start crawling (homepage, sitemap, or specific pages)Required
maxDepthintegerHow many clicks deep to crawl (0-10)3
maxPagesintegerMaximum pages to crawl (1-10,000)100
checkExternalLinksbooleanAlso check links to other domainstrue
screenshotBrokenPagesbooleanTake screenshots of 4xx/5xx pagestrue
followRedirectsbooleanTrack full redirect chainstrue
timeoutintegerRequest timeout in seconds (5-120)30
includeWarningsbooleanReport mixed content, slow pages, etc.true
userAgentstringCustom user agent (leave empty for default)""
useProxybooleanUse residential proxy to bypass bot protection (extra cost)false
requestDelayintegerDelay between requests in ms (0-10000). 0 = no delay. External links use 3x this value.0

About Bot Protection (403 Errors)

Many external websites block automated requests and return 403 Forbidden errors. This doesn't mean the link is brokenβ€”it just means the site is blocking bots.

Without proxy (default): You may see 403 "BLOCKED" status on external links. These are marked with isBroken: false and confidence: low since the link likely works for real users. Verify manually if needed.

With residential proxy: Enable useProxy to route requests through residential IPs, which are less likely to be blocked. This adds ~$0.02-0.60 per scan depending on size (user pays for proxy traffic).

About Rate Limiting (429 Errors)

Some sites aggressively rate-limit requests. If you see 429 "Too Many Requests" errors, increase the requestDelay setting:

  • 0 (default): No delay - fastest but may trigger rate limits
  • 500-1000: Light throttling - good for most sites
  • 1500-2000: Heavy throttling - for aggressive sites
  • External links: Automatically use 3x the configured delay

Example Input

{
"startUrls": [{ "url": "https://example.com" }],
"maxDepth": 3,
"maxPages": 500,
"checkExternalLinks": true,
"screenshotBrokenPages": true
}

Output

Each checked link produces a record with:

FieldDescription
urlThe URL that was checked
statusCodeHTTP status code (200, 404, 500, etc.)
statusCategory: OK, BROKEN, BLOCKED, REDIRECT, TIMEOUT, ERROR, SERVER_ERROR
confidenceHow confident we are in the status: high, medium, low
isBrokenDefinitive broken flag (true only for actually broken links)
typeLink type: internal, external
foundOnPageWhich page contained this link
anchorTextThe link's anchor text
responseTimeResponse time in milliseconds
redirectChainFull redirect path if redirected
screenshotUrlLink to screenshot (for broken pages)
errorError message if request failed
warningWarnings (mixed content, slow, bot protection notice, etc.)
checkedAtWhen this URL was checked

Status Categories (v1.1.0+)

StatusMeaningIs Broken?
OKLink works (200-299)No
REDIRECTLink redirects (300-399)No
BROKENLink is dead (404, 410)Yes
BLOCKEDAccess denied (401, 403) - often bot protectionNo (external) / Yes (internal)
TIMEOUTRequest timed outYes
ERRORConnection/DNS failedYes
SERVER_ERRORServer error (500-599)Yes
CLIENT_ERROROther 4xx errorsYes

Confidence Levels

ConfidenceMeaning
highStatus is definitive (404, 200, timeout, etc.)
mediumStatus may vary (timeout could be temporary)
lowStatus uncertainβ€”external 403s often block bots but work in browsers

Example Output

{
"url": "https://example.com/old-page",
"statusCode": 404,
"status": "BROKEN",
"confidence": "high",
"isBroken": true,
"type": "internal",
"foundOnPage": "https://example.com/blog",
"anchorText": "Read our old article",
"responseTime": 245,
"redirectChain": null,
"screenshotUrl": "https://api.apify.com/v2/key-value-stores/abc123/records/screenshot-xyz",
"error": null,
"warning": null,
"checkedAt": "2025-12-13T10:30:00.000Z"
}
{
"url": "https://www.viator.com/tours",
"statusCode": 403,
"status": "BLOCKED",
"confidence": "low",
"isBroken": false,
"type": "external",
"foundOnPage": "https://yoursite.com/travel",
"warning": "External site may be blocking automated requests. Verify manually.",
"checkedAt": "2025-12-13T10:30:00.000Z"
}

Redirect Chain Example

{
"url": "https://example.com/page",
"statusCode": 200,
"status": "REDIRECT",
"confidence": "high",
"isBroken": false,
"redirectChain": "https://example.com/page β†’ https://example.com/new-page β†’ https://example.com/final-page",
"warning": "Long redirect chain: 3 hops"
}

Summary Statistics

After each run, a summary is saved to the Key-Value Store under the key summary:

{
"totalLinksChecked": 847,
"brokenLinks": 12,
"blockedLinks": 5,
"redirects": 45,
"warnings": 8,
"pagesProcessed": 100,
"scanCompletedAt": "2025-12-13T10:45:00.000Z"
}

Log Output (v1.1.0+)

The scanner now provides detailed log output at the end of each run:

═══════════════════════════════════════════
SCAN COMPLETE
═══════════════════════════════════════════
Pages crawled: 7
Links checked: 26
❌ Broken links: 0
🚫 Blocked: 2 (likely bot protection)
β†ͺ️ Redirects: 11
⚠️ Warnings: 1
═══════════════════════════════════════════
πŸ“‹ WARNING DETAILS:
───────────────────────────────────────────
β€’ https://yoursite.com/page
Warning: Slow response: 3500ms
Found on: https://yoursite.com/
───────────────────────────────────────────
πŸ” BLOCKED LINKS (verify manually):
───────────────────────────────────────────
β€’ https://www.viator.com/tours
Status: 403 | Found on: https://yoursite.com/travel
β€’ https://www.britishmuseum.org/collection
Status: 403 | Found on: https://yoursite.com/museums
───────────────────────────────────────────

How to Use the Results

1. Export to CSV

Download results as CSV for spreadsheet analysis or client reports.

2. Filter by Status

Use Apify's dataset filters to show only broken links or only redirects.

3. Use the isBroken Field

Filter by isBroken: true to get only definitely broken links, ignoring bot-blocked external sites.

4. Use Screenshots

Each broken page screenshot is stored in the Key-Value Store. URLs are included in the output for easy access.

5. Automate with Schedules

Set up scheduled runs to monitor site health over time.

6. Integrate via API

const { ApifyClient } = require('apify-client');
const client = new ApifyClient({ token: 'YOUR_TOKEN' });
const run = await client.actor('YOUR_USERNAME/site-health-scanner').call({
startUrls: [{ url: 'https://example.com' }],
maxPages: 500
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
// Get only definitely broken links (ignores bot-blocked externals)
const brokenLinks = items.filter(item => item.isBroken === true);
console.log(`Found ${brokenLinks.length} broken links`);
// Get blocked links that need manual verification
const blockedLinks = items.filter(item => item.status === 'BLOCKED');
console.log(`${blockedLinks.length} links blocked (verify manually)`);

Understanding Status Codes

CodeStatusMeaningAction Needed
200OKWorkingNone
301REDIRECTPermanent redirectUpdate link to final URL
302REDIRECTTemporary redirectUsually OK, monitor
400CLIENT_ERRORBad requestFix malformed URL
401BLOCKEDUnauthorizedCheck if link needs auth
403BLOCKEDForbiddenExternal: likely bot protection, verify manually. Internal: check permissions
404BROKENNot foundRemove or fix link
410BROKENGoneRemove link
500SERVER_ERRORServer errorContact site owner
503SERVER_ERRORService unavailableRetry later
0ERRORConnection failedDNS or network issue

Comparison with Other Tools

FeatureSite Health ScannerScreaming FrogAhrefs
Cloud-basedβœ…βŒ (Desktop)βœ…
Screenshots of broken pagesβœ…βŒβŒ
Smart bot-block detectionβœ…βŒβŒ
Confidence scoringβœ…βŒβŒ
Pay-per-use pricingβœ…License feeSubscription
API accessβœ…Limitedβœ…
Scheduled runsβœ…Manualβœ…
External link checkingβœ…βœ…βœ…
Redirect chain trackingβœ…βœ…βœ…

Limitations

  • JavaScript-rendered content is supported, but very complex SPAs may not extract all links
  • Some servers block automated requestsβ€”try adjusting the userAgent setting
  • Screenshot capture adds processing time (disable if not needed for faster scans)
  • External links are checked but not crawled (only status code verified)
  • Maximum 10,000 pages per run

Use Cases

SEO Audit

Run before and after site migrations to catch broken links that could hurt rankings.

Client Reporting

Use screenshots to show clients exactly what's brokenβ€”no technical explanation needed.

Continuous Monitoring

Schedule weekly runs to catch new broken links before they impact SEO or user experience.

Pre-Launch QA

Verify all links work before going live with a new site or major update.

Troubleshooting

ProblemSolution
Timeout errorsIncrease the timeout setting
Many 403 "blocked" on external sitesThis is expectedβ€”big sites block bots. Verify manually if needed.
Missing pagesIncrease maxDepth or maxPages
Slow scansDisable checkExternalLinks or screenshotBrokenPages
Some links not foundComplex JavaScript navigation may hide links

Changelog

v1.1.0 (December 2025)

  • IMPROVED: Better status classification
    • BLOCKED status for 401/403 (separate from BROKEN)
    • External 403s marked as isBroken: false (likely bot protection)
    • Internal 403s still marked as isBroken: true
  • NEW: confidence field (high/medium/low)
  • NEW: isBroken field for definitive broken detection
  • NEW: Verbose log output showing warning details, blocked links, and broken links
  • NEW: blockedLinks count in summary

v1.0.0 (December 2025)

  • Initial release
  • Broken link detection with screenshot proof
  • Redirect chain tracking
  • Mixed content warnings
  • External link checking
  • Response time monitoring

Find broken links before your users do.