Website Image Scraper
Pricing
from $1.00 / 1,000 results
Website Image Scraper
Extract every image URL from a website. Crawls the start page (and optionally internal links up to a configurable depth), parses `<img>` tags, `<picture>`/`<source>`, `srcset` candidates, and CSS `background-image` declarations. HTTP-only, no proxy or browser needed.
Pricing
from $1.00 / 1,000 results
Rating
5.0
(20)
Developer
Crawler Bros
Actor stats
20
Bookmarked
4
Total users
1
Monthly active users
8 days ago
Last modified
Categories
Share
Extract every image URL from a website. Crawls the start page (and optionally internal links up to a configurable depth), then parses <img> tags, <picture>/<source>, srcset candidates, <link rel="icon">, and CSS background-image declarations. HTTP-only — no browser, no proxy, no API key.
What it does
- Pull every image URL referenced on a page —
<img src>, lazy-loadeddata-src, srcset candidates, picture sources, favicons, inlinestyle="background-image: url(...)". - Crawl deeper — follow internal links up to
maxCrawlDepth(same host only) to grab images from linked pages too. - Filter by format — restrict to specific extensions (e.g. only SVG, only WebP/AVIF).
- Bounded —
maxImagesPerPageandmaxTotalImageskeep runs cost-predictable on large galleries.
Input
| Field | Type | Default | Description |
|---|---|---|---|
startUrl | string (required) | https://apify.com | Page to start crawling. Must be http:// or https://. |
maxCrawlDepth | integer | 1 (0–5) | 0 = only the start URL; 1+ = follow internal links one level (same host only). |
maxImagesPerPage | integer | 200 (1–5000) | Cap per page — keeps pathological galleries bounded. |
maxTotalImages | integer | 1000 (1–50000) | Hard cap on total images emitted across the whole run. |
imageExtensions | array | [jpg, jpeg, png, gif, webp, svg, avif, bmp, ico] | Only URLs whose path ends in one of these are kept. |
includeBackgroundImages | boolean | true | Also extract from inline style="background-image: url(...)". |
userAgent | string | (Chrome 131) | Optional UA override. |
Example input
{"startUrl": "https://apify.com","maxCrawlDepth": 1,"maxImagesPerPage": 200,"maxTotalImages": 500,"imageExtensions": ["jpg", "png", "webp", "svg"],"includeBackgroundImages": true}
Output
One record per unique image URL. Empty fields are omitted (no nulls).
{"url": "https://apify.com/static/hero.jpg","sourcePage": "https://apify.com/","pageTitle": "Apify · The full-stack web-scraping & automation platform","alt": "Apify hero image","hasAltText": true,"title": "Apify","width": 1200,"height": 600,"extension": "jpg","discoveredVia": "img-tag","mimeTypeHint": "image/jpeg","crawlDepth": 0,"scrapedAt": "2024-12-16T14:23:11+00:00"}
Output fields
url— absolute URL of the image (data: URIs and javascript: pseudo-URLs are filtered out).sourcePage— the page where the image was discovered.pageTitle—<title>of the page where the image was found (handy for grouping the dataset by page name).alt—altattribute of the<img>tag (when present).hasAltText— derived boolean:truewhenaltis present and non-empty. Lets you filter accessibility issues without testing for field presence.title—titleattribute (when present).width/height— explicit pixel dimensions from the tag (only emitted when numeric).extension— lowercase file extension parsed from the URL path (e.g."jpg","svg","webp"). Useful for format-bucket aggregations.discoveredVia— one ofimg-tag,srcset,picture-source,link-icon,css-background.mimeTypeHint— derived from the file extension (e.g.image/png,image/svg+xml).crawlDepth— depth at which the page was crawled (0 = startUrl).scrapedAt— ISO-8601 timestamp.
Use cases
- Content audits — see every image a website serves up, broken down by source (img tag vs CSS background).
- Asset inventory — pull all logos, hero images, and icons from a competitor or brand site.
- Format migration — find every JPEG/PNG to convert to WebP/AVIF, or every PNG to convert to SVG.
- SEO / accessibility — list images with
hasAltText: falseto flag accessibility issues at a glance.
FAQ
Does it download the image binaries? No. The actor only collects URLs and metadata. Combine with a separate downloader (or pipe URLs into Apify's standard "URL list" actor) if you need the bytes.
Does it work on JavaScript-rendered pages? Mostly no. This scraper is HTTP-only — it sees the server-rendered HTML, not what runs after the page boots. If a site lazy-loads images via React/Vue, you may only see fallback / placeholder images. For SPA-rendered content, use a Playwright-based actor instead.
Can I limit it to a single page?
Set maxCrawlDepth: 0. Only the start URL is fetched.
Does it follow external links?
No. Internal-link crawling only follows links to the same host as startUrl to keep cost and scope bounded.
What if the site has no images at all?
You get a single sentinel record {"type": "website_image_scraper_error", "reason": "no_images_found"} so the dataset is non-empty. The run still completes successfully.
How does it deduplicate?
By absolute URL. The same image referenced from multiple pages produces one record (the first-seen page is recorded as sourcePage).
