Website Image Scraper
Pricing
Pay per event
Website Image Scraper
Extract all image URLs from any website — alt text, dimensions, srcset, and CSS background images. Works on both static and JavaScript-rendered pages.
Pricing
Pay per event
Rating
0.0
(0)
Developer
BowTiedRaccoon
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Extract every image URL from any website. Give it a URL, get back a list of images with alt text, dimensions, srcset candidates, and CSS background-image URLs. Works on both static and JavaScript-rendered pages.
What it does
Points a Playwright browser at your URL, lets the page fully render, then pulls every image it can find — <img> tags, <picture>/<source> elements, lazy-load data-src attributes, and background-image CSS rules. Returns one record per image with the metadata that's actually useful.
Optionally follows internal links up to a configurable depth, so you can audit images across an entire section of a site rather than just one page.
Output
Each record contains:
| Field | Description |
|---|---|
image_url | Absolute URL of the image |
page_url | URL of the page where the image was found |
alt_text | Alt text (empty string if none) |
width | Width attribute value (empty string if not set in HTML) |
height | Height attribute value (empty string if not set in HTML) |
srcset | Raw srcset attribute value |
srcset_urls | Comma-separated absolute URLs parsed from srcset |
loading | Loading attribute (lazy, eager, or empty string) |
source_tag | Source element: img, source, or css-background |
scraped_at | ISO 8601 timestamp |
Input
| Parameter | Type | Default | Description |
|---|---|---|---|
url | string | — | Website URL to extract images from |
maxItems | integer | 200 | Maximum image records to return. Set to 0 for unlimited |
crawlLinks | boolean | false | Follow internal links to scrape images from multiple pages |
maxDepth | integer | 1 | Max depth for internal link crawling (1–3) |
Usage notes
On JavaScript-rendered sites: Images loaded via lazy-loading, React, Vue, or Angular hydration are fully captured. The actor waits for the page network to idle before extracting, which catches most dynamic content.
On srcset: Multi-resolution images are captured as both the raw srcset string and a parsed comma-separated list of absolute URLs. The image_url field holds the primary src value.
On CSS backgrounds: Computed background-image styles are walked across all visible elements. Inline data URIs are filtered out — the output contains only linkable image URLs.
On depth: With crawlLinks: true and maxDepth: 1, the actor crawls the start page plus any internal pages linked from it. Each page is only visited once. Fan-out is capped at 30 new links per page to avoid runaway crawls.
Example output
{"image_url": "https://books.toscrape.com/media/cache/fe/72/fe72f0532301ec28892ae79a629a293c.jpg","page_url": "https://books.toscrape.com","alt_text": "A Light in the Attic","width": "","height": "","srcset": "","srcset_urls": "","loading": "","source_tag": "img","scraped_at": "2026-06-10T12:00:00.000Z"}
Questions or issues?
Use the feedback fields in the input form or reach out at actor-support@orbtop.com.