Bulk URL Status Checker – Broken Link & Redirect Audit avatar

Bulk URL Status Checker – Broken Link & Redirect Audit

Pricing

from $1.00 / 1,000 results

Go to Apify Store
Bulk URL Status Checker – Broken Link & Redirect Audit

Bulk URL Status Checker – Broken Link & Redirect Audit

Bulk URL checker for HTTP status codes (200/301/302/404/410/500) with final URL, redirect chain and response time. Fast parallel link audit for SEO, site migrations, monitoring and QA. Export clean results to CSV/JSON.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

Logiover

Logiover

Maintained by Community

Actor stats

1

Bookmarked

60

Total users

19

Monthly active users

a day ago

Last modified

Share

Bulk URL Status Checker — Broken Link & Redirect Audit

Bulk URL Status Checker – Broken Link & Redirect Audit

Check the HTTP status of thousands of URLs in seconds. Detects broken links (404, 500), traces full redirect chains (301, 302), and measures response times — all in one fast, scalable run. The essential tool for SEO audits, link monitoring, content migrations, and QA pipelines.


What Is This Actor?

Every website accumulates broken links, outdated redirects, and slow pages over time. Checking them manually — or even with browser-based tools — doesn't scale. This actor takes any list of URLs and checks each one in parallel via direct HTTP requests, returning a structured report with status codes, redirect chains, final destination URLs, and response times.

Built for:

  • 🔍 SEO auditors — identify 404s and redirect chains hurting crawl budget
  • 🛠️ Web developers — validate links after a site migration or CMS change
  • 📋 QA engineers — run automated link health checks in CI/CD pipelines
  • 📣 Content managers — monitor internal and external link health across articles
  • 📊 Data analysts — enrich URL datasets with live status metadata
  • 🔗 Backlink managers — verify that inbound links still resolve correctly

Features

  • Bulk checking — audit thousands of URLs in a single run
  • Full HTTP status detection — 200 OK, 301/302 redirects, 403 Forbidden, 404 Not Found, 410 Gone, 500 Server Error, and more
  • Redirect chain tracing — captures every hop in a redirect sequence, not just the final destination
  • Final URL reporting — always shows where a URL ultimately resolves to
  • Response time measurement — millisecond-precision timing per URL
  • Network error handling — DNS failures, timeouts, and connection errors are recorded as status 0 (not silently dropped)
  • Automatic retries — transient network errors are retried up to 2 times before marking as failed
  • High concurrency — up to 100 parallel checks configurable; default 20
  • Proxy support — built-in Apify Proxy integration to avoid rate limiting at scale
  • No browser overhead — pure HTTP via got-scraping; blazing fast and low cost
  • Export-ready — JSON, CSV, and Excel output via Apify Dataset

Output Data

Each record in the dataset corresponds to one checked URL.

FieldTypeDescription
urlstringThe original URL as provided in the input
statusCodeintegerHTTP response status code. 0 means a network/DNS error
statusMessagestringHuman-readable status description (e.g. "Not Found", "OK")
isBrokenbooleantrue if statusCode is 400 or above, or 0 (network error)
isRedirectbooleantrue if the URL was redirected at least once before resolving
redirectChainarrayOrdered list of intermediate URLs in the redirect chain
finalUrlstring | nullThe last URL after all redirects. null on network error
responseTimeintegerTotal time in milliseconds from request to final response
checkedAtstringISO 8601 timestamp of when this URL was checked

Status Code Reference

CodeMessageisBrokenMeaning
200OK❌ NoPage loaded successfully
201Created❌ NoResource created
301Moved Permanently❌ NoPermanent redirect (followed)
302Found❌ NoTemporary redirect (followed)
307Temporary Redirect❌ NoTemporary redirect (followed)
308Permanent Redirect❌ NoPermanent redirect (followed)
400Bad Request✅ YesMalformed request
401Unauthorized✅ YesAuthentication required
403Forbidden✅ YesAccess denied
404Not Found✅ YesPage does not exist
410Gone✅ YesPage permanently removed
429Too Many Requests✅ YesRate limited
500Internal Server Error✅ YesServer-side error
502Bad Gateway✅ YesUpstream server error
503Service Unavailable✅ YesServer temporarily down
504Gateway Timeout✅ YesServer timed out
0Network Error✅ YesDNS failure, connection refused, timeout

Sample Output Records

Healthy URL:

{
"url": "https://apify.com/blog",
"statusCode": 200,
"statusMessage": "OK",
"isBroken": false,
"isRedirect": false,
"redirectChain": [],
"finalUrl": "https://apify.com/blog",
"responseTime": 312,
"checkedAt": "2025-05-15T10:22:05.000Z"
}

Redirect chain:

{
"url": "http://oldsite.com/page",
"statusCode": 200,
"statusMessage": "OK",
"isBroken": false,
"isRedirect": true,
"redirectChain": [
"https://oldsite.com/page",
"https://newsite.com/page"
],
"finalUrl": "https://newsite.com/page",
"responseTime": 580,
"checkedAt": "2025-05-15T10:22:06.000Z"
}

Broken link:

{
"url": "https://apify.com/non-existent-page",
"statusCode": 404,
"statusMessage": "Not Found",
"isBroken": true,
"isRedirect": false,
"redirectChain": [],
"finalUrl": "https://apify.com/non-existent-page",
"responseTime": 210,
"checkedAt": "2025-05-15T10:22:07.000Z"
}

Network error:

{
"url": "https://this-domain-does-not-exist.io/page",
"statusCode": 0,
"statusMessage": "getaddrinfo ENOTFOUND this-domain-does-not-exist.io",
"isBroken": true,
"isRedirect": false,
"redirectChain": [],
"finalUrl": null,
"responseTime": 5021,
"checkedAt": "2025-05-15T10:22:12.000Z"
}

Input Configuration

startUrls · array · required

The list of URLs to check. Supports the full Apify requestListSources format.

[
{ "url": "https://example.com/page-1" },
{ "url": "https://example.com/page-2" },
{ "url": "https://oldsite.com/legacy-path" }
]

You can paste URLs directly in the Apify Console, import from a text list, or pass them programmatically via the Apify API. There is no hard limit on the number of URLs — the actor processes them all.


maxConcurrency · integer · default: 20 · min: 1 · max: 100

How many URLs to check simultaneously.

ValueUse Case
5–10Small lists, conservative proxy usage
20 (default)Balanced speed and reliability for most use cases
50–100Maximum speed for large audits; requires sufficient proxy pool

Higher concurrency means faster runs but also more simultaneous outbound requests. For very high concurrency, using Apify Proxy is strongly recommended to avoid triggering rate limits on target servers.


proxyConfiguration · object · default: Apify Proxy enabled

Proxy configuration for all HTTP requests.

{ "useApifyProxy": true }

Using a proxy is recommended for:

  • Large URL lists (thousands of URLs to the same domain)
  • Checking URLs that rate-limit by IP (e.g. 429 Too Many Requests)
  • Avoiding your actor's IP being blocked mid-run

For small, diverse URL lists against different domains, a proxy may not be necessary.


Usage Examples

Example 1 — Quick check of a handful of URLs

{
"startUrls": [
{ "url": "https://example.com" },
{ "url": "https://example.com/contact" },
{ "url": "https://example.com/old-page" }
],
"maxConcurrency": 5,
"proxyConfiguration": { "useApifyProxy": false }
}

Example 2 — Post-migration audit of 10,000 URLs

{
"startUrls": [
{ "url": "https://oldsite.com/page-1" },
{ "url": "https://oldsite.com/page-2" }
// ... (import full list via CSV or API)
],
"maxConcurrency": 50,
"proxyConfiguration": { "useApifyProxy": true }
}

{
"startUrls": [
{ "url": "https://partner-site.com/our-mention" },
{ "url": "https://news-site.com/article/brand-coverage" }
],
"maxConcurrency": 20,
"proxyConfiguration": { "useApifyProxy": true }
}

Example 4 — Sitemap-driven full-site audit

Combine this actor with the Sitemap to URL Crawler actor:

  1. Run Sitemap to URL Crawler on your domain → get all URLs
  2. Export that dataset as JSON
  3. Feed the URL list into this actor as startUrls
  4. Get a complete HTTP status report for every page on your site

How It Works

The actor uses BasicCrawler with got-scraping for pure HTTP requests — no browser, no JavaScript rendering, no unnecessary overhead.

For each URL in the input:

Step 1 — Send HTTP GET request
A full GET request is made (not just HEAD) for maximum compatibility. Some servers return different status codes for HEAD vs GET. The response body is not stored — only headers and metadata are used.

Step 2 — Follow redirects
Redirects are followed automatically (up to 10 hops). Every intermediate URL in the redirect chain is recorded in redirectChain.

Step 3 — Record result
Status code, message, redirect chain, final URL, and response time are written to the dataset immediately.

Step 4 — Handle errors
If the request fails (DNS failure, connection refused, timeout), the error is caught and recorded as statusCode: 0 with the raw error message in statusMessage. The URL is never silently skipped.

Step 5 — Retry on transient failures
Network-level failures (not 4xx/5xx HTTP errors) are automatically retried up to 2 times before recording a final failure.

Input URL List
┌─────────────────────────────────┐
│ GET request via got-scraping │
│ - followRedirect: true (max 10)
│ - throwHttpErrors: false
│ - timeout: 15s │
└────────────┬────────────────────┘
┌──────┴──────┐
│ │
Success Network Error
│ │
Parse status statusCode = 0
+ redirect error.message →
chain statusMessage
│ │
└──────┬──────┘
Push to Dataset

Performance

URL CountConcurrencyEst. TimeNotes
10020< 30 secSmall audit
1,00020~3–5 minStandard blog/site audit
10,00050~10–20 minPost-migration check
100,000100~1–2 hoursEnterprise-scale audit

Response time per URL depends heavily on target server speed. The actor itself adds minimal overhead — it's a direct HTTP check.

Cost: This actor uses BasicCrawler with pure HTTP requests — no browser, no Playwright. Compute cost is negligible. For 100,000 URLs at concurrency 50, expect < $0.50 in Apify compute units.


Export & Analysis

Download your results from the Apify Dataset in:

  • JSON — full structured output with arrays for redirectChain
  • CSV — flat table; redirectChain is serialized as a comma-joined string
  • Excel (.xlsx) — native spreadsheet for sharing with non-technical stakeholders
  • JSONL — one record per line for streaming into data pipelines

Once exported to CSV, use a simple filter:

  • Column isBroken = TRUE → all broken URLs (4xx, 5xx, network errors)
  • Column isRedirect = TRUE → all URLs that redirect
  • Column statusCode = 404 → specifically missing pages
  • Column statusCode = 301 → permanent redirects (high SEO importance)

Filtering via Apify API

Use the dataset filter API to retrieve only broken URLs:

GET /v2/datasets/{datasetId}/items?filter=isBroken%3Dtrue

Common Use Cases In Detail

Post-Migration Redirect Audit

After moving a website to a new domain or restructuring URLs, every old URL should redirect (301) to its new equivalent. This actor lets you:

  1. Feed your old sitemap URLs as input
  2. Check that every URL returns 301 or 308
  3. Verify finalUrl points to the correct new page
  4. Flag any 404s where a redirect is missing

Search engines penalize sites with broken internal and external links. Export your full site URL list from a sitemap or crawl, run this actor, filter isBroken = true, and prioritize fixes by page importance.

Redirect Chain Optimization

Long redirect chains (A → B → C → D) waste crawl budget and add latency. Use the redirectChain field to identify multi-hop chains and collapse them to direct redirects. Flag any chain with redirectChain.length > 1.

If you've earned backlinks pointing to specific pages, use this actor on a schedule to verify those URLs still resolve to 200 OK. A 404 on a linked page means lost link equity.

API & Webhook URL Validation

Before deploying an integration, run all API endpoint URLs through this actor to confirm they return 200 or 201 rather than unexpected 4xx or 5xx responses.


Limitations

  • No JavaScript rendering. The actor makes raw HTTP requests. Pages that require JavaScript to load (SPAs, React apps) still return the correct HTTP status code, but the final resolved URL may differ from what a browser would show after JS-based routing.
  • Authentication not supported. URLs behind login walls return 401 or 403 as expected, but the actor cannot authenticate to check protected content.
  • HEAD vs GET. The actor uses GET (not HEAD) for better compatibility. This means it downloads the response body, but discards it immediately — a small amount of extra bandwidth is used per URL.
  • Max 10 redirects per URL. Redirect chains longer than 10 hops are aborted. Chains this long almost always indicate a redirect loop and would be flagged as broken in practice.
  • Rate limiting. If a target server rate-limits your requests (429), the result is recorded as isBroken: true with statusCode: 429. Use Apify Proxy and/or lower maxConcurrency to reduce the rate of requests per server.
  • Timeout at 15 seconds. URLs that don't respond within 15 seconds are recorded as network errors (statusCode: 0). Increase this threshold via code modification if checking known slow endpoints.

Frequently Asked Questions

Q: What's the difference between statusCode: 0 and statusCode: 404?
statusCode: 404 means the server responded and explicitly said the page doesn't exist. statusCode: 0 means the actor never received a response at all — DNS resolution failed, the server refused the connection, or the request timed out.

Q: Does it follow redirects?
Yes, automatically, up to 10 hops. redirectChain records every intermediate URL, and finalUrl shows the ultimate destination.

Q: Can I check HTTP (non-HTTPS) URLs?
Yes. Both HTTP and HTTPS URLs are supported. Invalid SSL certificates are also handled gracefully — ignoreHTTPSErrors is not enabled by default, so SSL errors are recorded as failures.

Q: How do I check 100,000 URLs efficiently?
Set maxConcurrency to 50–100 and enable Apify Proxy. At concurrency 100 with average server response times of 500 ms, you can expect ~200 URLs/second throughput.

Q: Can I use this in a scheduled run for link monitoring?
Yes — use the Apify Scheduler to trigger runs daily, weekly, or on any interval. Combine with the Apify API or webhook notifications to alert you when new broken links appear.

Q: What happens if a URL in my list is malformed?
Malformed URLs that can't be parsed as valid HTTP URLs will result in a network error (statusCode: 0) with a descriptive error message.

Q: Can I pipe the output of the Sitemap Crawler directly into this actor?
Yes. Export the sitemap crawler's dataset as JSON, then use that as input to this actor — or connect them via the Apify API for a fully automated audit pipeline.

Q: Is responseTime the time to first byte or total download time?
It is the total wall-clock time from when the request is sent to when the final response (including all redirect hops) is received. Since the body is discarded immediately, this closely approximates time to first byte for redirect chains.


Technical Details

PropertyValue
RuntimeNode.js (ES Modules)
FrameworkApify SDK v3 + Crawlee BasicCrawler
HTTP clientgot-scraping (browser-like headers, proxy support)
Request methodGET with responseType: 'text' (body discarded)
Redirect handlingAutomatic, max 10 hops
Request timeout15,000 ms
Handler timeout30,000 ms
Max retries2 (transient network errors only)
Default concurrency20
Max concurrency100
Error handlingAll failures recorded; nothing silently dropped

Changelog

  • 2026-06-01 — Maintenance & reliability pass: pulled the latest source and rebuilt the Actor on the current base image; build verified.
  • 2026-05-25 — Maintenance & reliability pass: pulled the latest source and rebuilt the Actor on the current base image; build verified.

v1.0

  • Initial release
  • Full HTTP GET status checking with got-scraping
  • Automatic redirect chain tracing (up to 10 hops)
  • Network error capture as statusCode: 0
  • isBroken and isRedirect boolean flags for easy filtering
  • Response time measurement per URL
  • Automatic retry on transient network failures (2 retries)
  • Configurable concurrency (1–100)
  • Apify Proxy integration
  • JSON, CSV, and Excel export

Support

If you encounter unexpected results — wrong status codes, proxy issues, or timeouts — please open a support ticket via the Apify Console. Include the affected URLs, your input configuration, and the run ID to help diagnose the issue.


Changelog

  • 2026-05-20 — Maintenance pass: reviewed the input schema and default values for a smooth one-click start, and rebuilt the Actor on the latest base image.

Last reviewed: 2026-06-01.