Bulk URL Status Checker
Pricing
Pay per event
Bulk URL Status Checker
Bulk check URLs for status codes, redirects, broken links, response times, canonical tags, robots meta, headers, and final destinations.
Pricing
Pay per event
Rating
0.0
(0)
Developer
Stas Persiianenko
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Check bulk URLs for HTTP status codes, redirect chains, broken links, response timing, canonical URLs, robots meta tags, content type, and final destination URLs.
Use this actor when you need a repeatable API-friendly URL audit for SEO, website migrations, QA, campaign launches, and content operations.
What does Bulk URL Status Checker do?
Bulk URL Status Checker takes URLs from pasted lists, text blocks, hosted URL lists, or XML sitemaps.
It checks each URL over HTTP and returns a structured dataset row for every URL.
The actor reports status code, status text, final URL, redirect count, redirect chain, broken-link flag, response time, content type, content length, canonical URL, robots meta, and error metadata.
It is designed for operational checks where the status itself is the data.
If a target page returns 403, 404, 500, timeout, or another failure, the actor records that response instead of treating the whole run as failed.
Who is it for?
- π SEO agencies auditing migrations and technical SEO fixes.
- π§ Website migration teams validating old-to-new URL maps.
- π§ͺ QA teams checking landing pages before releases.
- π° Content operations teams finding removed or redirected articles.
- π Growth teams checking campaign URLs before launch.
- π§° Developers building status-check APIs into internal dashboards.
- π§Ύ Analysts who need CSV, JSON, Excel, or API exports from URL checks.
Why use this URL status checker?
A simple browser test is not enough when you have hundreds or thousands of URLs.
This actor gives you a repeatable Apify run, dataset exports, API access, webhooks, and scheduling.
You can run it after deployments, before ad campaigns, during SEO migrations, and as part of weekly site health checks.
It is HTTP-only and lightweight, so it is cheaper than browser crawlers for status-code workflows.
Key features
- β Bulk HTTP status checks.
- β HEAD with GET fallback for speed and compatibility.
- β GET-only mode for servers that reject HEAD.
- β Redirect following with redirect-chain details.
- β Broken-link classification.
- β Response-time measurement.
- β Canonical URL extraction from HTML.
- β Robots meta extraction from HTML.
- β Content type and content length.
- β Optional raw response headers.
- β Sitemap URL ingestion.
- β Hosted plain-text or CSV-like URL list ingestion.
- β Configurable concurrency, timeout, and user-agent.
How much does it cost to check bulk URL status codes?
This actor uses pay-per-event pricing.
There is a small start fee and a per-URL checked fee.
The default starting price in the actor package is:
- Start event: $0.005 per run.
- URL checked event: $0.000069405 at the BRONZE tier, with volume discounts for higher tiers.
Pricing was calculated from cloud runs using the standard 70% target NET margin formula.
Input sources
You can provide URLs in four ways.
urls: direct list of URLs.urlsText: pasted text containing URLs separated by newlines, spaces, commas, tabs, or semicolons.sitemapUrl: XML sitemap URL; the actor extracts<loc>entries.listUrl: hosted text or CSV-like file containing URLs.
At least one source is required.
Duplicates are removed after normalization.
Input options
| Field | Type | Description |
|---|---|---|
urls | array | URLs to check directly. |
urlsText | string | Pasted URL block. |
sitemapUrl | string | XML sitemap URL to parse. |
listUrl | string | Hosted text or CSV-like URL list. |
maxUrls | integer | Maximum unique URLs to check. |
maxConcurrency | integer | Parallel URL checks. |
timeoutSecs | integer | Request timeout per URL. |
followRedirects | boolean | Follow redirects and report the final URL. |
method | string | head-get-fallback or get. |
includeHtmlSignals | boolean | Extract canonical and robots meta. |
includeHeaders | boolean | Include raw response headers. |
userAgent | string | Optional custom User-Agent. |
Example input
{"urls": ["https://example.com/","https://www.iana.org/domains/example","https://httpstat.us/404"],"maxUrls": 100,"maxConcurrency": 20,"timeoutSecs": 15,"followRedirects": true,"method": "head-get-fallback","includeHtmlSignals": true,"includeHeaders": false}
Sitemap audit example
{"sitemapUrl": "https://www.iana.org/sitemap.xml","maxUrls": 500,"maxConcurrency": 10,"includeHtmlSignals": true}
Use this mode to audit indexed URLs, migration sitemaps, or generated sitemap files.
Output data
Each dataset item represents one URL check.
| Field | Description |
|---|---|
inputUrl | Original URL supplied by the user. |
normalizedUrl | URL after scheme normalization and hash removal. |
statusCode | Final HTTP status code, or null on request error. |
statusText | Human-readable status text when known. |
finalUrl | Final URL after redirects. |
redirectChain | Array of redirect hops with URL, status, and location. |
redirectCount | Number of redirect hops. |
isBroken | True for request errors or HTTP status 400+. |
isRedirect | True when at least one redirect was followed. |
responseTimeMs | Request duration in milliseconds. |
contentType | Response Content-Type header. |
contentLength | Response Content-Length header when available. |
canonicalUrl | Canonical URL extracted from HTML, when requested. |
robotsMeta | Robots meta content extracted from HTML, when requested. |
errorType | Request error code or error name. |
errorMessage | Request error message. |
checkedAt | ISO timestamp for the check. |
headers | Optional raw response headers. |
Example output
{"inputUrl": "https://example.com/","normalizedUrl": "https://example.com/","statusCode": 200,"statusText": "OK","finalUrl": "https://example.com/","redirectChain": [],"redirectCount": 0,"isBroken": false,"isRedirect": false,"responseTimeMs": 184,"contentType": "text/html","contentLength": 1256,"canonicalUrl": null,"robotsMeta": null,"errorType": null,"errorMessage": null,"checkedAt": "2026-06-22T00:00:00.000Z"}
Redirect chain checks
When followRedirects is enabled, the actor follows redirects up to the HTTP client limit.
The final dataset row still represents the original URL.
The redirectChain field stores each hop with the source URL, status code, and Location header.
Use this for migration maps, HTTP-to-HTTPS checks, trailing-slash cleanup, and canonical destination validation.
Broken link checks
isBroken is true when the final status code is 400 or higher.
It is also true for invalid URLs, timeouts, DNS errors, TLS errors, and connection failures.
Blocked URLs such as 401, 403, or 429 are preserved as HTTP status results.
That makes the actor useful for reporting what happened instead of hiding protected URLs as run failures.
Canonical and robots meta checks
When includeHtmlSignals is true, the actor parses HTML pages for:
- canonical link:
<link rel="canonical" href="..."> - robots meta:
<meta name="robots" content="...">
This is useful for SEO QA after site migrations and template changes.
The actor only attempts these checks for HTML responses.
Performance tips
Start with maxConcurrency 10-20 for general websites.
Use lower concurrency for small sites, fragile servers, or URLs behind rate limits.
Use head-get-fallback for most runs because HEAD is fast and GET fallback handles servers that reject HEAD.
Use get when you know target servers return inaccurate HEAD responses.
Keep includeHeaders disabled unless you need raw headers in exports.
Integrations
You can integrate this actor into many workflows:
- Schedule a weekly sitemap status audit.
- Trigger a run after a deployment.
- Send broken-link results to Slack through Apify webhooks.
- Export redirect chains to Google Sheets.
- Pull dataset items into a BI tool.
- Use API results in internal QA dashboards.
- Compare old and new migration URL maps in a data warehouse.
API usage with Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: process.env.APIFY_TOKEN });const run = await client.actor('automation-lab/bulk-url-status-checker').call({urls: ['https://example.com/', 'https://httpstat.us/404'],followRedirects: true,});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);
API usage with Python
from apify_client import ApifyClientclient = ApifyClient('YOUR_APIFY_TOKEN')run = client.actor('automation-lab/bulk-url-status-checker').call(run_input={'urls': ['https://example.com/', 'https://httpstat.us/404'],'followRedirects': True,})items = client.dataset(run['defaultDatasetId']).list_items().itemsprint(items)
API usage with cURL
curl -X POST 'https://api.apify.com/v2/acts/automation-lab~bulk-url-status-checker/runs?token=YOUR_APIFY_TOKEN' \-H 'Content-Type: application/json' \-d '{"urls":["https://example.com/","https://httpstat.us/404"],"followRedirects":true}'
MCP integration
Use Apify MCP to run this actor from Claude Desktop, Claude Code, or other MCP clients.
MCP URL:
https://mcp.apify.com/?tools=automation-lab/bulk-url-status-checker
Add it in Claude Code:
$claude mcp add apify-bulk-url-status-checker https://mcp.apify.com/?tools=automation-lab/bulk-url-status-checker
Claude Desktop JSON configuration:
{"mcpServers": {"apify-bulk-url-status-checker": {"url": "https://mcp.apify.com/?tools=automation-lab/bulk-url-status-checker"}}}
Example prompts:
- "Check these 50 URLs and summarize the broken links."
- "Run a sitemap status audit and group results by status code."
- "Find redirects in this migration URL list and export final URLs."
Data quality notes
HTTP status checks depend on target server behavior.
Some servers treat HEAD and GET differently.
Some servers block datacenter traffic, unknown user agents, or high concurrency.
For those cases, use GET mode, lower concurrency, and a custom user agent that identifies your crawler policy.
The actor reports the observed result rather than attempting to bypass access controls.
Troubleshooting
Why do I see 403 or 429?
The target server is refusing or rate limiting requests.
Lower concurrency, use a custom user agent, or check whether your organization allows automated checks against that domain.
Why is contentLength null?
Many servers use chunked transfer encoding or omit Content-Length.
The actor reports null when the header is missing.
Why is canonicalUrl null?
The page may not be HTML, canonical extraction may be disabled, or the page may not contain a canonical tag.
Legality and ethical use
Only check URLs you are allowed to audit.
Respect robots policies, rate limits, and site terms.
Do not use the actor to overload third-party servers.
The actor is intended for diagnostics, QA, SEO operations, and link-health monitoring.
Related scrapers and tools
- https://apify.com/automation-lab/http-status-checker
- https://apify.com/automation-lab/website-contact-finder
- https://apify.com/automation-lab/domain-to-linkedin-url-resolver
Use the simpler HTTP Status Checker for small one-off status checks.
Use Bulk URL Status Checker when you need sitemap/list ingestion, canonical hints, robots meta, and richer redirect/broken-link audit fields.
FAQ
Can I check thousands of URLs?
Yes. Increase maxUrls and choose a concurrency that is safe for the target domains.
Does it use a browser?
No. It is an HTTP-only actor for status and header checks.
Does it scrape page content?
No. It only fetches enough page HTML to extract canonical and robots meta when that option is enabled.
Can I schedule it?
Yes. Use Apify schedules to run it daily, weekly, or after deployments.
Can I export to CSV?
Yes. Apify datasets can be exported as JSON, CSV, Excel, XML, RSS, or HTML.
Changelog
- 0.1.0: Initial build with URL list, text, sitemap, and hosted list ingestion; status checks; redirect chain output; canonical and robots meta extraction.