Pricing

from $5.00 / 1,000 results

🔗 Sitemap & Broken Link Checker — Find Dead Links & 404s

Crawl any website or its sitemap and check every link for broken 404s, redirects, and errors — with the source page for each. Perfect for SEO audits and site QA.

Pricing

from $5.00 / 1,000 results

Rating

0.0

(0)

Developer

Is Koren

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

✨ What this actor does

🗺️ Sitemap discovery — reads /sitemap.xml, follows sitemap index files, and parses Sitemap: lines in robots.txt.
🕷️ Internal crawl — BFS-crawls internal pages from your start URLs and collects every <a href> link, remembering where it was found.
✅ Status checking — verifies each link with a HEAD→GET strategy (allow_redirects=True), recording the final HTTP status.
🌐 Internal & external — optionally checks off-domain (external) links too.
🧯 Crash-proof — a single bad URL never stops the run; failures are recorded as error.

Perfect for SEO audits, website QA, content migrations, and routine broken link monitoring.

🚀 Quick start

Paste this input and run:

{
  "startUrls": ["https://apify.com"],
  "useSitemap": true,
  "crawlLinks": true,
  "checkExternalLinks": true,
  "maxUrls": 100,
  "proxyConfiguration": { "useApifyProxy": true }
}

The actor discovers links from apify.com, checks each one, and pushes a row per link to the dataset.

⚙️ Input

Field	Type	Default	Description
`startUrls`	array	`["https://apify.com"]`	Website(s) to check. URLs are discovered from each site's sitemap and/or by crawling.
`useSitemap`	boolean	`true`	Discover URLs from `/sitemap.xml`, sitemap index files, and `robots.txt`.
`crawlLinks`	boolean	`true`	Crawl internal pages (same domain, BFS) to discover links and record their source page.
`checkExternalLinks`	boolean	`true`	Also check links pointing to other domains. When off, only same-domain links are checked.
`maxUrls`	integer	`100`	Total links to check across all start URLs (hard cap, 1–2000). Keeps runs cheap and finite.
`proxyConfiguration`	object	`{ "useApifyProxy": true }`	Proxy settings. Apify Proxy is used by default to avoid IP blocks.

📤 Output

One record per checked link:

{
  "url": "https://apify.com/store",
  "statusCode": 200,
  "ok": true,
  "category": "ok",
  "redirectedTo": null,
  "linkType": "internal",
  "foundOn": "https://apify.com",
  "error": null
}

A broken link looks like this:

{
  "url": "https://apify.com/this-page-does-not-exist",
  "statusCode": 404,
  "ok": false,
  "category": "broken",
  "redirectedTo": null,
  "linkType": "internal",
  "foundOn": "https://apify.com/blog",
  "error": null
}

Output fields

Field	Description
`url`	The link that was checked.
`statusCode`	Final HTTP status code (after redirects), or `null` if the request failed.
`ok`	`true` when the status is below 400.
`category`	One of `ok`, `broken` (4xx/5xx), `redirect` (3xx hop occurred), or `error` (request failed).
`redirectedTo`	Final URL if the link redirected, otherwise `null`.
`linkType`	`internal` (same domain) or `external` (off-domain).
`foundOn`	The page where the link was discovered.
`error`	Error message if the request failed, otherwise `null`.

💡 Tip: Filter the dataset by category = "broken" (or ok = false) to get just the dead links, and group by foundOn to see which pages need fixing.

❓ FAQ

How do I find only broken links? Filter the output where category is broken or error, or where ok is false. The foundOn field tells you which page to fix.

Does it follow redirects? Yes. Links are checked with allow_redirects=True. If any redirect hop occurred, category is redirect and redirectedTo holds the final URL.

What if the site has no sitemap? Leave crawlLinks enabled. The actor crawls internal pages from your start URLs to discover links even when no sitemap exists. You can also rely on the Sitemap: line in robots.txt, which is checked automatically.

Will it check external (off-site) links? Yes, when checkExternalLinks is true (the default). Set it to false to limit checks to your own domain.

How do I keep runs cheap? Use maxUrls to cap the total number of links checked. The crawl and sitemap discovery both respect this limit.

Why are some links reported as error? Some servers reject automated requests, time out, or have DNS/TLS issues. These are recorded as error with a message in the error field instead of crashing the run.

🧪 Tips for best results

🎯 Start with a small maxUrls (e.g. 30–50) to preview results, then raise it for a full audit.
🔁 Schedule the actor to run weekly and monitor the broken count over time.
🧹 Disable checkExternalLinks for faster internal-only QA runs.
🌍 Keep Apify Proxy on to reduce the chance of rate limiting or IP blocks.
📊 Export to CSV/Excel/JSON from the dataset for sharing with your SEO or content team.

Broken Link Checker - Find Dead 404 Links

logiover/broken-link-checker

Site-wide broken link checker: crawl any website, find 404 and dead links, export the link audit to CSV or JSON with source page and status code.

Logiover

Broken Link Checker — Find All 404s & Dead Links on Any Website

q_services/broken-link-checker

Crawl any website and find every broken link (404, 500, dead domains) with status code, source page and anchor text. SEO audit ready.

Q Services

Broken Link Checker - Find 404s and Dead Links

santamaria-automations/broken-link-checker

Crawl any website and find broken links, 404 errors, redirect chains, timeouts, and SSL failures. Essential for SEO audits, QA, and content maintenance. Export data, run via API, schedule and monitor runs, or integrate with other tools.

NanoScrape

Broken Link Checker - 404 & Redirect Finder

ninhothedev/broken-link-checker

$0.5/1K 🔥 Broken link checker! Crawl a page's links & flag 404s, redirects & errors. No key. JSON, CSV, Excel or API in seconds. Fix link rot & boost SEO ⚡

ninhothedev

Broken Link Checker & Scraper - 404 Audit API

pink_comic/broken-link-checker

Scan pages for broken links, dead URLs, 404s, redirects, TLS failures, timeouts, and resource errors. Bulk link checker/scraper for SEO audits, content QA, migrations, and link-rot monitoring with source evidence, safe URL handling, and bounded paid output.

Ava Torres

Broken Link Checker

blazing_stake/broken-link-checker

Crawl a website and find every broken link (404/500/timeouts) with the source page where each was found. Internal + external. Great for SEO & QA audits.

Mehmet Kut

Broken Link Checker - Website Link Validator & 404 Finder

scrappy_garden/broken-link-checker

Crawl a website (or list of pages) and detect broken links (404/500), unreachable URLs, and invalid asset references. Generates a structured report for SEO audits, QA testing, and website maintenance.

Bikram Adhikari

Broken Link Crawler

pattonholdings/broken-link-crawler

Crawl a website and find every broken link (404s, dead ends). HEAD-checks internal + external links, reports HTTP status and referrer pages. Use for site QA, SEO audits, migration verification. Input: url + maxPages. Output: JSON summary + per-broken-link rows with referrers.