🔗 Sitemap & Broken Link Checker — Find Dead Links & 404s avatar

🔗 Sitemap & Broken Link Checker — Find Dead Links & 404s

Pricing

from $5.00 / 1,000 results

Go to Apify Store
🔗 Sitemap & Broken Link Checker — Find Dead Links & 404s

🔗 Sitemap & Broken Link Checker — Find Dead Links & 404s

Crawl any website or its sitemap and check every link for broken 404s, redirects, and errors — with the source page for each. Perfect for SEO audits and site QA.

Pricing

from $5.00 / 1,000 results

Rating

0.0

(0)

Developer

Is Koren

Is Koren

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

Find broken links, dead pages, and 404 errors on any website — fast. This sitemap & broken link checker discovers your site's URLs from its sitemap.xml, sitemap index files, and robots.txt, and/or by crawling internal links, then checks the HTTP status of every link. You get a clean report of broken links (4xx/5xx), redirects (3xx), and OK pages — plus the exact page each link was found on, so SEO and QA teams can fix dead links before users (and search engines) do.

✨ What this actor does

  • 🗺️ Sitemap discovery — reads /sitemap.xml, follows sitemap index files, and parses Sitemap: lines in robots.txt.
  • 🕷️ Internal crawl — BFS-crawls internal pages from your start URLs and collects every <a href> link, remembering where it was found.
  • Status checking — verifies each link with a HEAD→GET strategy (allow_redirects=True), recording the final HTTP status.
  • 🌐 Internal & external — optionally checks off-domain (external) links too.
  • 🧯 Crash-proof — a single bad URL never stops the run; failures are recorded as error.

Perfect for SEO audits, website QA, content migrations, and routine broken link monitoring.

🚀 Quick start

Paste this input and run:

{
"startUrls": ["https://apify.com"],
"useSitemap": true,
"crawlLinks": true,
"checkExternalLinks": true,
"maxUrls": 100,
"proxyConfiguration": { "useApifyProxy": true }
}

The actor discovers links from apify.com, checks each one, and pushes a row per link to the dataset.

⚙️ Input

FieldTypeDefaultDescription
startUrlsarray["https://apify.com"]Website(s) to check. URLs are discovered from each site's sitemap and/or by crawling.
useSitemapbooleantrueDiscover URLs from /sitemap.xml, sitemap index files, and robots.txt.
crawlLinksbooleantrueCrawl internal pages (same domain, BFS) to discover links and record their source page.
checkExternalLinksbooleantrueAlso check links pointing to other domains. When off, only same-domain links are checked.
maxUrlsinteger100Total links to check across all start URLs (hard cap, 1–2000). Keeps runs cheap and finite.
proxyConfigurationobject{ "useApifyProxy": true }Proxy settings. Apify Proxy is used by default to avoid IP blocks.

📤 Output

One record per checked link:

{
"url": "https://apify.com/store",
"statusCode": 200,
"ok": true,
"category": "ok",
"redirectedTo": null,
"linkType": "internal",
"foundOn": "https://apify.com",
"error": null
}

A broken link looks like this:

{
"url": "https://apify.com/this-page-does-not-exist",
"statusCode": 404,
"ok": false,
"category": "broken",
"redirectedTo": null,
"linkType": "internal",
"foundOn": "https://apify.com/blog",
"error": null
}

Output fields

FieldDescription
urlThe link that was checked.
statusCodeFinal HTTP status code (after redirects), or null if the request failed.
oktrue when the status is below 400.
categoryOne of ok, broken (4xx/5xx), redirect (3xx hop occurred), or error (request failed).
redirectedToFinal URL if the link redirected, otherwise null.
linkTypeinternal (same domain) or external (off-domain).
foundOnThe page where the link was discovered.
errorError message if the request failed, otherwise null.

💡 Tip: Filter the dataset by category = "broken" (or ok = false) to get just the dead links, and group by foundOn to see which pages need fixing.

❓ FAQ

How do I find only broken links? Filter the output where category is broken or error, or where ok is false. The foundOn field tells you which page to fix.

Does it follow redirects? Yes. Links are checked with allow_redirects=True. If any redirect hop occurred, category is redirect and redirectedTo holds the final URL.

What if the site has no sitemap? Leave crawlLinks enabled. The actor crawls internal pages from your start URLs to discover links even when no sitemap exists. You can also rely on the Sitemap: line in robots.txt, which is checked automatically.

Will it check external (off-site) links? Yes, when checkExternalLinks is true (the default). Set it to false to limit checks to your own domain.

How do I keep runs cheap? Use maxUrls to cap the total number of links checked. The crawl and sitemap discovery both respect this limit.

Why are some links reported as error? Some servers reject automated requests, time out, or have DNS/TLS issues. These are recorded as error with a message in the error field instead of crashing the run.

🧪 Tips for best results

  • 🎯 Start with a small maxUrls (e.g. 30–50) to preview results, then raise it for a full audit.
  • 🔁 Schedule the actor to run weekly and monitor the broken count over time.
  • 🧹 Disable checkExternalLinks for faster internal-only QA runs.
  • 🌍 Keep Apify Proxy on to reduce the chance of rate limiting or IP blocks.
  • 📊 Export to CSV/Excel/JSON from the dataset for sharing with your SEO or content team.