Website Image & Media Crawler — Bulk Asset Extractor
Pricing
from $4.50 / 1,000 results
Website Image & Media Crawler — Bulk Asset Extractor
Crawl an entire website and extract every image, video and audio asset — with alt text, dimensions, source page and file type. Perfect for AI training datasets, image SEO audits, asset inventories and migrations. No login, no browser.
Pricing
from $4.50 / 1,000 results
Rating
0.0
(0)
Developer
Logiover
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
16 hours ago
Last modified
Categories
Share
Website Image & Media Crawler — Bulk Image & Asset Scraper 🖼️
Extract every image, video and audio file from a website. This image scraper / media extractor crawls an entire site and pulls out all media assets — together with alt text, dimensions, the source page and file type. Point it at one URL and it inventories the media across thousands of pages automatically. No login, no headless browser.
Need to scrape all images from a website, build an image dataset for AI, run an image SEO / alt-text audit, or inventory media before a migration? This actor delivers the full list of asset URLs and metadata.
✨ Key features
- 🕷️ Full-site crawl — start from one URL and follow internal links across the whole domain.
- 🖼️ Every media type —
<img>,srcset,<picture>, lazy-loadeddata-src, CSS background images,<video>+ posters,<audio>, plusog:image,twitter:imageand favicons. - 🔗 Absolute, de-duplicated URLs — clean asset URLs ready to download or analyze.
- 🏷️ Rich metadata — alt text, title, width/height, loading attribute and where each asset was found.
- ⚡ Fast & cheap — pure HTTP, no browser, high concurrency.
💡 Use cases
- AI / ML training datasets — collect large image sets with their alt-text captions for multimodal models.
- Image SEO audits — find images missing
alttext at scale and improve accessibility & rankings. - Asset inventories & migrations — list every media file on a site before a redesign or platform move.
- E-commerce & competitor research — pull product imagery across a whole catalog.
- Bulk image download lists — generate a clean URL list to fetch images in bulk.
📦 What you get
One row per media asset:
| Field | Description |
|---|---|
pageUrl | The page the asset was found on |
mediaUrl | Absolute URL of the asset |
mediaType | image, video, audio or icon |
foundIn | Source (img, img-srcset, picture-source, meta, css-background, video, …) |
fileExtension | jpg, png, webp, mp4, svg, … |
alt / title | Image alt and title text |
width / height | Declared dimensions |
loading | lazy / eager |
poster | Video poster image |
crawledAt | ISO 8601 timestamp |
Example output
{"pageUrl": "https://shop.example.com/product/123","mediaUrl": "https://shop.example.com/img/123-main.jpg","mediaType": "image","foundIn": "img","fileExtension": "jpg","alt": "Blue running shoe, side view","width": "800","height": "800","crawledAt": "2026-05-25T14:15:28.001Z"}
🚀 How to use it
- Click Try for free / Start.
- Paste one or more website URLs into Start URLs.
- (Optional) Set Max pages to crawl —
0for the whole site. - (Optional) Toggle which media to include: images, video, audio, CSS backgrounds.
- Click Save & Start.
- Export the asset list as JSON, CSV, Excel or via API.
⚙️ Input
| Option | Description | Default |
|---|---|---|
startUrls | Websites to crawl | – (required) |
maxPagesToCrawl | Max pages per run (0 = whole site) | 1000 |
includeImages | <img>, srcset, <picture>, og:image, favicons | true |
includeVideo | <video> sources and posters | true |
includeAudio | <audio> sources | true |
includeBackgroundImages | CSS inline background images | true |
maxConcurrency | Parallel requests | 10 |
Example input
{"startUrls": [{ "url": "https://example.com" }],"maxPagesToCrawl": 2000,"includeImages": true}
🔍 How it works
The crawler follows internal links within the same domain as your Start URLs. On each page it extracts media from <img> (including srcset and data-src), <picture>, inline CSS backgrounds, <video>/<audio> and their <source> children, plus og:image, twitter:image and favicons. All URLs are resolved to absolute and de-duplicated per page. Pure HTTP — fast and inexpensive.
🧰 Tips & best practices
- Set
maxPagesToCrawlto0to inventory an entire catalog or media library. - Filter by
mediaTypeorfileExtensionafter the run to get exactly the assets you need. - Use
imagesMissingAlt-style filtering: rows wherealtis empty are your image-SEO fixes. - To download the files, feed the
mediaUrllist into a bulk downloader.
❓ FAQ
Does it download the image files? No — it extracts asset URLs and metadata. You can download them from the mediaUrl list afterwards with any bulk downloader.
Does it capture lazy-loaded images? Yes — it reads data-src, srcset and <picture> sources in addition to plain src.
Does it render JavaScript? No — it parses server-rendered HTML for speed and low cost.
How do I crawl the whole site? Set maxPagesToCrawl to 0.
What formats can I export? JSON, CSV, Excel, HTML and a full REST API.
🔗 Related actors by the same author
- Website to Markdown & Text Crawler — clean text + Markdown for AI / RAG.
- Website SEO Audit Crawler — on-page SEO audit including image alt coverage.
- Broken Link Checker — find dead links across a whole site.
- Sitemap to URL Crawler — extract all URLs from any sitemap.xml.
Changelog
- 2026-05-25 — Maintenance & reliability pass: pulled the latest source and rebuilt the Actor on the current base image; build verified.
Last reviewed: 2026-05-25.