Broken Image Checker avatar
Broken Image Checker

Pricing

from $10.00 / 1,000 url checkeds

Go to Apify Store
Broken Image Checker

Broken Image Checker

Detect broken or missing images on any public webpage and get a clean, actionable report. Perfect for SEO professionals, webmasters, QA testers, and UX teams.

Pricing

from $10.00 / 1,000 url checkeds

Rating

0.0

(0)

Developer

Mark Peterson

Mark Peterson

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

18 hours ago

Last modified

Categories

Share

Broken Image Checker - Find Missing & Broken Images Across Multiple Pages

Detect broken or missing images across multiple webpages in a single run and get clean, actionable reports. Perfect for SEO professionals, webmasters, QA testers, and UX teams running site-wide audits.

Key Benefits:

  • Batch processing - Check hundreds of pages in one run
  • Fast detection using HEAD requests (10x faster than GET)
  • Sitemap integration - Works seamlessly with Sitemap Fetcher output
  • Accurate status codes and error messages
  • Aggregated reports with per-page breakdowns
  • Proxy support for geo-restricted content

Why use this actor?

Broken images hurt your SEO rankings, user experience, and brand credibility. Manual checking is time-consuming and error-prone, especially on large websites with hundreds of pages and thousands of images.

Problems this solves:

  • Site-wide SEO audits - Scan your entire website to identify broken images that harm search rankings and waste crawl budget
  • Batch QA testing - Check hundreds of pages before production deployment to catch missing images from CMS migrations
  • Site health monitoring - Monitor multiple pages simultaneously to detect CDN failures and broken external image links
  • UX optimization at scale - Find loading errors across your entire site that frustrate users and hurt conversions

Features

  • Batch processing - Check images across multiple pages in a single run (up to 10,000 pages)
  • Sitemap integration - Connect directly to Sitemap Fetcher output via dataset
  • Fast parallel checking - All images on each page checked simultaneously
  • HEAD requests first - 10x faster than GET, with automatic GET fallback
  • Sequential page processing - Prevents memory overload on large batches
  • Accurate error detection - HTTP status codes and detailed error messages
  • Aggregated reporting - Per-page breakdowns plus summary statistics
  • Graceful error handling - Continues processing even if individual pages fail
  • Handles relative and absolute URLs - Automatically converts to absolute
  • Proxy support - Works with Apify Proxy or custom proxies
  • Progress tracking - Real-time logging of page processing status

How it works

This actor processes your batch of URLs in the following steps:

  1. Load URLs: Reads your list of URLs (manual input, file upload, or dataset from Sitemap Fetcher)
  2. Process sequentially: Checks each page one at a time to avoid memory issues (respects maxPages limit)
  3. Fetch HTML: Downloads the webpage HTML from each URL
  4. Parse images: Extracts all <img> tags and converts relative URLs to absolute
  5. Check availability: Tests each image with HEAD request (faster), falls back to GET if needed
  6. Detect errors: Identifies broken images by HTTP status codes (404, 500, etc.) or request failures
  7. Aggregate results: Combines per-page results with summary statistics (total broken images, pages affected)
  8. Output report: Returns a comprehensive JSON report with per-page breakdowns and totals

Input

{
"startUrls": [
{ "url": "https://example.com" },
{ "url": "https://example.com/products" },
{ "url": "https://example.com/about" }
],
"maxPages": 100,
"timeoutMs": 8000,
"debugLog": false
}

Input parameters

FieldTypeDescriptionRequiredDefault
startUrlsarrayList of webpage URLs to scan for broken images. Supports manual entry, file upload, or dataset integrationYes-
maxPagesintegerMaximum number of pages to check (1-10,000). Controls cost and runtimeNo100
timeoutMsintegerTimeout for HTTP requests in milliseconds (1000-30000)No8000
proxyConfigurationobjectProxy settings for requests (Apify Proxy or custom)No-
debugLogbooleanEnable detailed logging for troubleshootingNofalse

Connecting to Sitemap Fetcher

You can pipe URLs directly from the Sitemap Fetcher actor:

  1. Run the Sitemap Fetcher actor to extract URLs from your sitemap
  2. In this actor's input, use the requestListSources editor
  3. Connect the Sitemap Fetcher's dataset as the source
  4. The actor will automatically extract URLs from the dataset

Output

The actor stores aggregated results in the default dataset:

{
"pagesChecked": 3,
"totalImages": 87,
"brokenImagesByPage": [
{
"pageUrl": "https://example.com",
"imageCount": 25,
"brokenImages": [],
"checkedAt": "2025-12-13T10:30:00.000Z",
"error": null
},
{
"pageUrl": "https://example.com/products",
"imageCount": 42,
"brokenImages": [
{
"src": "https://example.com/missing.jpg",
"status": 404,
"error": null
},
{
"src": "https://cdn.example.com/timeout.png",
"status": null,
"error": "Request timeout"
}
],
"checkedAt": "2025-12-13T10:30:15.000Z",
"error": null
},
{
"pageUrl": "https://example.com/about",
"imageCount": 20,
"brokenImages": [],
"checkedAt": "2025-12-13T10:30:22.000Z",
"error": null
}
],
"summary": {
"totalBrokenImages": 2,
"pagesWithBrokenImages": 1
},
"checkedAt": "2025-12-13T10:30:22.000Z"
}

Output fields

FieldTypeDescription
pagesCheckedintegerTotal number of pages successfully checked
totalImagesintegerTotal images found across all pages
brokenImagesByPagearrayPer-page results with broken images
brokenImagesByPage[].pageUrlstringThe webpage URL that was scanned
brokenImagesByPage[].imageCountintegerNumber of images found on this page
brokenImagesByPage[].brokenImagesarrayList of broken images on this page
brokenImagesByPage[].brokenImages[].srcstringURL of the broken image
brokenImagesByPage[].brokenImages[].statusinteger/nullHTTP status code (404, 500, etc.) or null if request failed
brokenImagesByPage[].brokenImages[].errorstring/nullError message if the request failed
brokenImagesByPage[].checkedAtstringISO 8601 timestamp when this page was checked
brokenImagesByPage[].errorstring/nullError message if the page failed to load
summaryobjectAggregated statistics across all pages
summary.totalBrokenImagesintegerTotal number of broken images found
summary.pagesWithBrokenImagesintegerNumber of pages that have at least one broken image
checkedAtstringISO 8601 timestamp when the run completed

Use cases

This actor is perfect for:

  • Site-wide SEO audits: Combine with Sitemap Fetcher to scan your entire website (hundreds or thousands of pages) to find broken images that hurt search rankings and waste crawl budget. Get a complete report showing which pages have issues.
  • Pre-launch QA testing: Batch check all staging environment pages before deployment to catch missing images from CMS migrations, broken CDN links, or incorrect image paths across your entire site.
  • Site health monitoring: Set up scheduled runs with your sitemap to continuously monitor hundreds of pages simultaneously, detecting CDN failures, expired external image links, or accidental deletions in real-time.
  • CRO/UX optimization at scale: Identify image loading errors across your entire site that frustrate users, increase bounce rates, and hurt conversion rates. Get summary statistics to prioritize fixes.
  • Client reporting: Generate comprehensive reports showing broken images across multiple client websites or sections, with aggregated statistics perfect for client presentations.

Proxy configuration

This actor supports both Apify Proxy and custom HTTP/HTTPS/SOCKS proxies.

{
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"],
"apifyProxyCountry": "US"
}
}

Using custom proxies

{
"proxyConfiguration": {
"proxyUrls": [
"http://proxy.example.com:8000"
]
}
}

Proxies are useful when checking geo-restricted content or avoiding rate limits on high-traffic sites.

⚙️ Performance

  • Typical runtime: 5–10 seconds per page for pages with ~50 images
  • Batch processing: Checks pages sequentially to avoid memory issues
  • Runs efficiently across hundreds or thousands of pages
  • Actual performance varies based on:
    • Number of pages in your batch (controlled by maxPages)
    • Number of images per page
    • Image server/CDN response times
    • Network latency to target servers
    • Timeout settings
    • Proxy configuration used

Tip: Start with maxPages: 10 to test your URLs, then scale to 100, 1000, or more as needed.

Error handling

This actor includes robust error handling:

  • Page-level resilience: If one page fails to load, the actor continues processing remaining pages
  • Automatic retries: Failed requests are retried with exponential backoff
  • HEAD/GET fallback: If HEAD requests fail, the actor automatically tries GET requests
  • Detailed logging: All errors are logged with context, including page progress ("Processing page 5 of 100")
  • Error reporting: Failed pages are included in output with error details for troubleshooting
  • Graceful failure: Successfully processed pages are reported even if some pages fail
  • Timeout handling: Configurable timeouts prevent hanging on slow servers
  • URL validation: Invalid URLs are logged and skipped rather than crashing the run

Limitations

  • Only checks publicly accessible webpages (no authentication support)
  • Maximum 10,000 pages per run (controlled by maxPages)
  • Maximum timeout of 30 seconds per image request
  • Sequential page processing (not parallel) to avoid memory issues
  • JavaScript-rendered images require the page to already have rendered HTML (consider using Playwright for dynamic sites)
  • Does not follow pagination or crawl dynamically - provide all URLs via startUrls or connect a dataset

Tips for best results

  1. Start small, then scale: Test with maxPages: 10 first to verify your URLs work, then increase to 100, 1000, etc.
  2. Combine with Sitemap Fetcher: Use the Sitemap Fetcher actor first to get all your site URLs, then connect its dataset to this actor for comprehensive coverage
  3. Use appropriate timeouts: Increase timeoutMs to 15000+ for slow CDNs or international servers
  4. Enable debug logging: Set debugLog: true when troubleshooting to see detailed per-page progress
  5. Use proxies for geo-content: Some CDNs serve different images based on location - use residential proxies to test from specific countries
  6. Monitor summary statistics: Check summary.totalBrokenImages and summary.pagesWithBrokenImages for quick insights before diving into per-page details
  7. Schedule regular runs: Set up scheduled runs to monitor site health continuously across all your important pages

Check out these related actors for comprehensive site auditing:

  • URL Canonicalizer + Redirect Resolver: Check for redirect chains and canonical URL issues
  • Sitemap Fetcher + Page Title Extractor: Analyze your sitemap and page metadata
  • URL Metadata Extractor: Extract Open Graph images and metadata from multiple pages

Support & feedback

Need help or have suggestions?

  • Issues: Create an issue in the GitHub repository
  • Email: Contact through Apify platform messaging

Changelog

Version 1.0.9 (2025-12-13)

  • Batch processing support - Check multiple pages in a single run (up to 10,000 pages)
  • Sitemap integration - requestListSources editor supports dataset connections
  • Aggregated reporting - Per-page results with summary statistics
  • Graceful error handling - Continues processing even if individual pages fail
  • Progress logging - Real-time page processing status
  • Added startUrls array input (replaces single url)
  • Added maxPages limit for cost control

Version 1.0.0 (2025-12-12)

  • Initial release
  • Fast parallel image checking with HEAD/GET fallback
  • Support for relative and absolute URLs
  • Proxy configuration support
  • Comprehensive error reporting

Made with care for the web development community

Part of the Apify Actor Portfolio collection