Site Health Scanner
Pricing
Pay per usage
Site Health Scanner
Crawl a website to detect broken and problematic links, identify redirects and blocked URLs, capture screenshots, and return structured site health data for audits, automation, and monitoring.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Quadruped
Actor stats
0
Bookmarked
4
Total users
2
Monthly active users
14 days ago
Last modified
Categories
Share
Site Health Scanner π
Find broken links before Google does. Get screenshots of every broken page as client-ready proof.
β οΈ Bot Protection Notice: Some websites have aggressive bot detection that blocks all automated requests. You can try enabling the Residential Proxy setting to bypass basic protection, but heavily protected sites (Cloudflare, advanced WAFs) may still block scans. Links on these sites will be marked as "BLOCKED" rather than brokenβverify them manually in a browser.
What does it do?
Site Health Scanner crawls your website and checks every linkβinternal pages, external URLs, images, scripts, and stylesheets. When it finds problems, it doesn't just report themβit takes screenshots so you have visual proof for clients or stakeholders.
Perfect for:
- SEO professionals auditing client sites
- Web agencies delivering health reports
- Site owners maintaining link integrity
- QA teams catching issues before launch
Features
| Feature | Description |
|---|---|
| π Broken Link Detection | Finds 4xx and 5xx errors across your entire site |
| π« Smart Bot-Block Detection | Distinguishes truly broken links from bot-blocked external sites |
| πΈ Screenshot Proof | Captures screenshots of broken pages automatically |
| βͺοΈ Redirect Chain Tracking | Maps full redirect paths, catches redirect loops |
| β οΈ Mixed Content Warnings | Identifies HTTP resources on HTTPS sites |
| β±οΈ Response Time Monitoring | Flags slow-loading pages and resources |
| π External Link Checking | Optionally validates outbound links |
Cost Estimate
| Site Size | Pages | Est. Time | Est. Cost |
|---|---|---|---|
| Small blog | 50 | 2-3 min | $0.02-0.05 |
| Business site | 200 | 8-12 min | $0.10-0.20 |
| E-commerce | 1,000 | 30-45 min | $0.50-1.00 |
| Large portal | 5,000 | 2-3 hours | $2.00-4.00 |
Based on Apify platform pricing. Actual costs vary by page complexity and settings.
Input
| Field | Type | Description | Default |
|---|---|---|---|
startUrls | array | URLs to start crawling (homepage, sitemap, or specific pages) | Required |
maxDepth | integer | How many clicks deep to crawl (0-10) | 3 |
maxPages | integer | Maximum pages to crawl (1-10,000) | 100 |
checkExternalLinks | boolean | Also check links to other domains | true |
screenshotBrokenPages | boolean | Take screenshots of 4xx/5xx pages | true |
followRedirects | boolean | Track full redirect chains | true |
timeout | integer | Request timeout in seconds (5-120) | 30 |
includeWarnings | boolean | Report mixed content, slow pages, etc. | true |
userAgent | string | Custom user agent (leave empty for default) | "" |
useProxy | boolean | Use residential proxy to bypass bot protection (extra cost) | false |
requestDelay | integer | Delay between requests in ms (0-10000). 0 = no delay. External links use 3x this value. | 0 |
About Bot Protection (403 Errors)
Many external websites block automated requests and return 403 Forbidden errors. This doesn't mean the link is brokenβit just means the site is blocking bots.
Without proxy (default): You may see 403 "BLOCKED" status on external links. These are marked with isBroken: false and confidence: low since the link likely works for real users. Verify manually if needed.
With residential proxy: Enable useProxy to route requests through residential IPs, which are less likely to be blocked. This adds ~$0.02-0.60 per scan depending on size (user pays for proxy traffic).
About Rate Limiting (429 Errors)
Some sites aggressively rate-limit requests. If you see 429 "Too Many Requests" errors, increase the requestDelay setting:
- 0 (default): No delay - fastest but may trigger rate limits
- 500-1000: Light throttling - good for most sites
- 1500-2000: Heavy throttling - for aggressive sites
- External links: Automatically use 3x the configured delay
Example Input
{"startUrls": [{ "url": "https://example.com" }],"maxDepth": 3,"maxPages": 500,"checkExternalLinks": true,"screenshotBrokenPages": true}
Output
Each checked link produces a record with:
| Field | Description |
|---|---|
url | The URL that was checked |
statusCode | HTTP status code (200, 404, 500, etc.) |
status | Category: OK, BROKEN, BLOCKED, REDIRECT, TIMEOUT, ERROR, SERVER_ERROR |
confidence | How confident we are in the status: high, medium, low |
isBroken | Definitive broken flag (true only for actually broken links) |
type | Link type: internal, external |
foundOnPage | Which page contained this link |
anchorText | The link's anchor text |
responseTime | Response time in milliseconds |
redirectChain | Full redirect path if redirected |
screenshotUrl | Link to screenshot (for broken pages) |
error | Error message if request failed |
warning | Warnings (mixed content, slow, bot protection notice, etc.) |
checkedAt | When this URL was checked |
Status Categories (v1.1.0+)
| Status | Meaning | Is Broken? |
|---|---|---|
OK | Link works (200-299) | No |
REDIRECT | Link redirects (300-399) | No |
BROKEN | Link is dead (404, 410) | Yes |
BLOCKED | Access denied (401, 403) - often bot protection | No (external) / Yes (internal) |
TIMEOUT | Request timed out | Yes |
ERROR | Connection/DNS failed | Yes |
SERVER_ERROR | Server error (500-599) | Yes |
CLIENT_ERROR | Other 4xx errors | Yes |
Confidence Levels
| Confidence | Meaning |
|---|---|
high | Status is definitive (404, 200, timeout, etc.) |
medium | Status may vary (timeout could be temporary) |
low | Status uncertainβexternal 403s often block bots but work in browsers |
Example Output
{"url": "https://example.com/old-page","statusCode": 404,"status": "BROKEN","confidence": "high","isBroken": true,"type": "internal","foundOnPage": "https://example.com/blog","anchorText": "Read our old article","responseTime": 245,"redirectChain": null,"screenshotUrl": "https://api.apify.com/v2/key-value-stores/abc123/records/screenshot-xyz","error": null,"warning": null,"checkedAt": "2025-12-13T10:30:00.000Z"}
External Blocked Link Example
{"url": "https://www.viator.com/tours","statusCode": 403,"status": "BLOCKED","confidence": "low","isBroken": false,"type": "external","foundOnPage": "https://yoursite.com/travel","warning": "External site may be blocking automated requests. Verify manually.","checkedAt": "2025-12-13T10:30:00.000Z"}
Redirect Chain Example
{"url": "https://example.com/page","statusCode": 200,"status": "REDIRECT","confidence": "high","isBroken": false,"redirectChain": "https://example.com/page β https://example.com/new-page β https://example.com/final-page","warning": "Long redirect chain: 3 hops"}
Summary Statistics
After each run, a summary is saved to the Key-Value Store under the key summary:
{"totalLinksChecked": 847,"brokenLinks": 12,"blockedLinks": 5,"redirects": 45,"warnings": 8,"pagesProcessed": 100,"scanCompletedAt": "2025-12-13T10:45:00.000Z"}
Log Output (v1.1.0+)
The scanner now provides detailed log output at the end of each run:
βββββββββββββββββββββββββββββββββββββββββββSCAN COMPLETEβββββββββββββββββββββββββββββββββββββββββββPages crawled: 7Links checked: 26β Broken links: 0π« Blocked: 2 (likely bot protection)βͺοΈ Redirects: 11β οΈ Warnings: 1βββββββββββββββββββββββββββββββββββββββββββπ WARNING DETAILS:ββββββββββββββββββββββββββββββββββββββββββββ’ https://yoursite.com/pageWarning: Slow response: 3500msFound on: https://yoursite.com/βββββββββββββββββββββββββββββββββββββββββββπ BLOCKED LINKS (verify manually):ββββββββββββββββββββββββββββββββββββββββββββ’ https://www.viator.com/toursStatus: 403 | Found on: https://yoursite.com/travelβ’ https://www.britishmuseum.org/collectionStatus: 403 | Found on: https://yoursite.com/museumsβββββββββββββββββββββββββββββββββββββββββββ
How to Use the Results
1. Export to CSV
Download results as CSV for spreadsheet analysis or client reports.
2. Filter by Status
Use Apify's dataset filters to show only broken links or only redirects.
3. Use the isBroken Field
Filter by isBroken: true to get only definitely broken links, ignoring bot-blocked external sites.
4. Use Screenshots
Each broken page screenshot is stored in the Key-Value Store. URLs are included in the output for easy access.
5. Automate with Schedules
Set up scheduled runs to monitor site health over time.
6. Integrate via API
const { ApifyClient } = require('apify-client');const client = new ApifyClient({ token: 'YOUR_TOKEN' });const run = await client.actor('YOUR_USERNAME/site-health-scanner').call({startUrls: [{ url: 'https://example.com' }],maxPages: 500});const { items } = await client.dataset(run.defaultDatasetId).listItems();// Get only definitely broken links (ignores bot-blocked externals)const brokenLinks = items.filter(item => item.isBroken === true);console.log(`Found ${brokenLinks.length} broken links`);// Get blocked links that need manual verificationconst blockedLinks = items.filter(item => item.status === 'BLOCKED');console.log(`${blockedLinks.length} links blocked (verify manually)`);
Understanding Status Codes
| Code | Status | Meaning | Action Needed |
|---|---|---|---|
| 200 | OK | Working | None |
| 301 | REDIRECT | Permanent redirect | Update link to final URL |
| 302 | REDIRECT | Temporary redirect | Usually OK, monitor |
| 400 | CLIENT_ERROR | Bad request | Fix malformed URL |
| 401 | BLOCKED | Unauthorized | Check if link needs auth |
| 403 | BLOCKED | Forbidden | External: likely bot protection, verify manually. Internal: check permissions |
| 404 | BROKEN | Not found | Remove or fix link |
| 410 | BROKEN | Gone | Remove link |
| 500 | SERVER_ERROR | Server error | Contact site owner |
| 503 | SERVER_ERROR | Service unavailable | Retry later |
| 0 | ERROR | Connection failed | DNS or network issue |
Comparison with Other Tools
| Feature | Site Health Scanner | Screaming Frog | Ahrefs |
|---|---|---|---|
| Cloud-based | β | β (Desktop) | β |
| Screenshots of broken pages | β | β | β |
| Smart bot-block detection | β | β | β |
| Confidence scoring | β | β | β |
| Pay-per-use pricing | β | License fee | Subscription |
| API access | β | Limited | β |
| Scheduled runs | β | Manual | β |
| External link checking | β | β | β |
| Redirect chain tracking | β | β | β |
Limitations
- JavaScript-rendered content is supported, but very complex SPAs may not extract all links
- Some servers block automated requestsβtry adjusting the
userAgentsetting - Screenshot capture adds processing time (disable if not needed for faster scans)
- External links are checked but not crawled (only status code verified)
- Maximum 10,000 pages per run
Use Cases
SEO Audit
Run before and after site migrations to catch broken links that could hurt rankings.
Client Reporting
Use screenshots to show clients exactly what's brokenβno technical explanation needed.
Continuous Monitoring
Schedule weekly runs to catch new broken links before they impact SEO or user experience.
Pre-Launch QA
Verify all links work before going live with a new site or major update.
Troubleshooting
| Problem | Solution |
|---|---|
| Timeout errors | Increase the timeout setting |
| Many 403 "blocked" on external sites | This is expectedβbig sites block bots. Verify manually if needed. |
| Missing pages | Increase maxDepth or maxPages |
| Slow scans | Disable checkExternalLinks or screenshotBrokenPages |
| Some links not found | Complex JavaScript navigation may hide links |
Changelog
v1.1.0 (December 2025)
- IMPROVED: Better status classification
BLOCKEDstatus for 401/403 (separate fromBROKEN)- External 403s marked as
isBroken: false(likely bot protection) - Internal 403s still marked as
isBroken: true
- NEW:
confidencefield (high/medium/low) - NEW:
isBrokenfield for definitive broken detection - NEW: Verbose log output showing warning details, blocked links, and broken links
- NEW:
blockedLinkscount in summary
v1.0.0 (December 2025)
- Initial release
- Broken link detection with screenshot proof
- Redirect chain tracking
- Mixed content warnings
- External link checking
- Response time monitoring
Find broken links before your users do.