🔗 Sitemap & Broken Link Checker — Find Dead Links & 404s
Pricing
from $5.00 / 1,000 results
🔗 Sitemap & Broken Link Checker — Find Dead Links & 404s
Crawl any website or its sitemap and check every link for broken 404s, redirects, and errors — with the source page for each. Perfect for SEO audits and site QA.
Pricing
from $5.00 / 1,000 results
Rating
0.0
(0)
Developer
Is Koren
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
Find broken links, dead pages, and 404 errors on any website — fast. This sitemap & broken link checker discovers your site's URLs from its sitemap.xml, sitemap index files, and robots.txt, and/or by crawling internal links, then checks the HTTP status of every link. You get a clean report of broken links (4xx/5xx), redirects (3xx), and OK pages — plus the exact page each link was found on, so SEO and QA teams can fix dead links before users (and search engines) do.
✨ What this actor does
- 🗺️ Sitemap discovery — reads
/sitemap.xml, follows sitemap index files, and parsesSitemap:lines inrobots.txt. - 🕷️ Internal crawl — BFS-crawls internal pages from your start URLs and collects every
<a href>link, remembering where it was found. - ✅ Status checking — verifies each link with a HEAD→GET strategy (
allow_redirects=True), recording the final HTTP status. - 🌐 Internal & external — optionally checks off-domain (external) links too.
- 🧯 Crash-proof — a single bad URL never stops the run; failures are recorded as
error.
Perfect for SEO audits, website QA, content migrations, and routine broken link monitoring.
🚀 Quick start
Paste this input and run:
{"startUrls": ["https://apify.com"],"useSitemap": true,"crawlLinks": true,"checkExternalLinks": true,"maxUrls": 100,"proxyConfiguration": { "useApifyProxy": true }}
The actor discovers links from apify.com, checks each one, and pushes a row per link to the dataset.
⚙️ Input
| Field | Type | Default | Description |
|---|---|---|---|
startUrls | array | ["https://apify.com"] | Website(s) to check. URLs are discovered from each site's sitemap and/or by crawling. |
useSitemap | boolean | true | Discover URLs from /sitemap.xml, sitemap index files, and robots.txt. |
crawlLinks | boolean | true | Crawl internal pages (same domain, BFS) to discover links and record their source page. |
checkExternalLinks | boolean | true | Also check links pointing to other domains. When off, only same-domain links are checked. |
maxUrls | integer | 100 | Total links to check across all start URLs (hard cap, 1–2000). Keeps runs cheap and finite. |
proxyConfiguration | object | { "useApifyProxy": true } | Proxy settings. Apify Proxy is used by default to avoid IP blocks. |
📤 Output
One record per checked link:
{"url": "https://apify.com/store","statusCode": 200,"ok": true,"category": "ok","redirectedTo": null,"linkType": "internal","foundOn": "https://apify.com","error": null}
A broken link looks like this:
{"url": "https://apify.com/this-page-does-not-exist","statusCode": 404,"ok": false,"category": "broken","redirectedTo": null,"linkType": "internal","foundOn": "https://apify.com/blog","error": null}
Output fields
| Field | Description |
|---|---|
url | The link that was checked. |
statusCode | Final HTTP status code (after redirects), or null if the request failed. |
ok | true when the status is below 400. |
category | One of ok, broken (4xx/5xx), redirect (3xx hop occurred), or error (request failed). |
redirectedTo | Final URL if the link redirected, otherwise null. |
linkType | internal (same domain) or external (off-domain). |
foundOn | The page where the link was discovered. |
error | Error message if the request failed, otherwise null. |
💡 Tip: Filter the dataset by category = "broken" (or ok = false) to get just the dead links, and group by foundOn to see which pages need fixing.
❓ FAQ
How do I find only broken links?
Filter the output where category is broken or error, or where ok is false. The foundOn field tells you which page to fix.
Does it follow redirects?
Yes. Links are checked with allow_redirects=True. If any redirect hop occurred, category is redirect and redirectedTo holds the final URL.
What if the site has no sitemap?
Leave crawlLinks enabled. The actor crawls internal pages from your start URLs to discover links even when no sitemap exists. You can also rely on the Sitemap: line in robots.txt, which is checked automatically.
Will it check external (off-site) links?
Yes, when checkExternalLinks is true (the default). Set it to false to limit checks to your own domain.
How do I keep runs cheap?
Use maxUrls to cap the total number of links checked. The crawl and sitemap discovery both respect this limit.
Why are some links reported as error?
Some servers reject automated requests, time out, or have DNS/TLS issues. These are recorded as error with a message in the error field instead of crashing the run.
🧪 Tips for best results
- 🎯 Start with a small
maxUrls(e.g. 30–50) to preview results, then raise it for a full audit. - 🔁 Schedule the actor to run weekly and monitor the
brokencount over time. - 🧹 Disable
checkExternalLinksfor faster internal-only QA runs. - 🌍 Keep Apify Proxy on to reduce the chance of rate limiting or IP blocks.
- 📊 Export to CSV/Excel/JSON from the dataset for sharing with your SEO or content team.