Web Images Scraper avatar

Web Images Scraper

Pricing

from $6.30 / 1,000 processed image urls

Go to Apify Store
Web Images Scraper

Web Images Scraper

Extract image URLs from public webpages, domains, and direct image links. Get source pages, discovery methods, metadata, and optional saved files or ZIP archives.

Pricing

from $6.30 / 1,000 processed image urls

Rating

0.0

(0)

Developer

Maxime Dupré

Maxime Dupré

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Categories

Share

🖼️ What is Web Images Scraper?

Web Images Scraper extracts image URLs from public webpages, domains, and direct image links. Use it to find webpage images, responsive srcset candidates, Open Graph images, icons, CSS background images, and optional saved image files without opening each page by hand.

Start with the prefilled Wikimedia Commons example or paste your own public website URL. Keep downloads off for a quick metadata-only run, then enable image saving or ZIP archives when you need files in Apify key-value storage.

🔎 What can Web Images Scraper do?

  • Extract images from img tags, srcset, page metadata, icons, inline styles, and linked stylesheets.
  • Accept public webpage URLs, bare domains, and direct image URLs.
  • Crawl same-domain links when you raise the crawl depth and page limit.
  • Filter images by file extension, minimum known byte size, and URL text.
  • Save one dataset item per accepted image as soon as it is accepted.
  • Optionally download images and create one ZIP archive per input target.
  • Export results through Apify datasets, API, webhooks, schedules, and integrations.

📦 What data can it extract?

Data pointWhat it means
Input URLThe webpage, domain, or direct image URL you submitted.
Source page URLThe page where the image was discovered.
Image URLThe original image URL found on the page.
Normalized image URLA cleaned absolute URL for matching and exports.
Filename and extensionFile naming details when they can be inferred.
Content type and file sizeMetadata from the image response when available.
Alt and title textNearby accessibility and title text from the page.
Discovery methodWhether the image came from img, srcset, metadata, CSS, or a direct URL.
Crawl detailsPage index, crawl depth, source order, and scrape timestamp.
Saved file linksApify storage links when image downloads are enabled.
ZIP file linksArchive metadata when ZIP creation is enabled.

🧭 How do I scrape images from a website?

  1. Add one or more public webpage URLs, domains, or direct image URLs.
  2. Choose how many images to keep per page.
  3. Keep Crawl depth at 0 for only the submitted page, or raise it to follow same-domain links.
  4. Leave discovery options enabled unless you want to exclude srcset, metadata, or CSS background images.
  5. Turn on Save image files to Apify storage if you need downloadable files.
  6. Turn on Create ZIP files only when downloads are enabled and you want one archive per input target.
  7. Run the Actor and export the dataset as JSON, CSV, Excel, XML, or through the Apify API.

⚙️ Input options

The main input is Webpage or image URLs. You can paste values such as:

[
{ "url": "https://www.python.org/" },
{ "url": "https://www.python.org/static/img/python-logo.png" }
]

Use Max images per page to cap how many image rows are saved from each page. Use Max pages per input and Crawl depth together when you want a small same-domain crawl instead of a single-page extraction.

The filter options are useful for bulk image downloader workflows. For example, keep only png and webp extensions, require image URLs to contain /uploads/, or exclude URLs containing logo or icon. Images without a known byte size are kept unless another filter removes them.

🧾 Output example

Each dataset item is one accepted image:

{
"inputIndex": 0,
"inputUrl": "https://www.python.org/",
"sourcePageUrl": "https://www.python.org/",
"imageUrl": "https://www.python.org/static/img/python-logo.png",
"normalizedImageUrl": "https://www.python.org/static/img/python-logo.png",
"filename": "python-logo.png",
"extension": "png",
"contentType": "image/png",
"fileSizeBytes": 15782,
"width": null,
"height": null,
"altText": "python logo",
"titleText": null,
"discoveryMethod": "img-src",
"sourceOrder": 1,
"pageIndex": 1,
"crawlDepth": 0,
"isDirectImage": false,
"duplicateKey": "a1b2c3",
"scrapedAt": "2026-06-04T00:00:00.000Z"
}

When downloads are enabled, rows also include downloadUrl and savedFile. When ZIP archives are enabled, rows also include zipFile.

💳 How much does Web Images Scraper cost?

This Actor uses pay-per-event pricing. You are charged for each input webpage or direct image URL that is successfully processed for image extraction, not for every image row saved.

For lower-cost first tests, run one or two targets with downloads disabled. Enabling downloads and ZIP archives can use more storage and runtime because the Actor has to fetch and save each accepted image file.

⚠️ Limits and caveats

Web Images Scraper is for public webpages and public image URLs. It does not log in, use cookies, submit forms, bypass private content, or search Google Images. Some websites block automated traffic, return temporary errors, lazy-load images in ways that are not present in the page HTML, or hide assets behind scripts that are not exposed as normal image URLs.

The Actor uses Apify Proxy by default. Invalid, blocked, or empty targets are handled gracefully, so one difficult website does not have to fail the whole run.

❓ FAQ

Can it download all images from a website?

It can crawl same-domain links up to your selected depth and page limit, then save accepted images from those pages. Keep the first run small, review the output, and raise limits only when the results match what you need.

Can I use direct image URLs?

Yes. A direct image URL is accepted as an input target and saved as one image row. If downloads are enabled, the file can also be saved to Apify storage.

Does it include CSS background images?

Yes, when Include CSS background images is enabled. It checks inline styles and linked stylesheets for image URLs.

Does it preserve duplicate images?

The output includes a duplicateKey so equivalent image URLs can be recognized in exports. The Actor keeps one accepted row for each discovered image identity in the run.

📝 Changelog

  • 0.0: Initial release.

🆘 Support

For issues, questions, or feature requests, file a ticket and I'll fix or implement it in less than 24h 🫡

🔗 Other actors

Made with ❤️ by Maxime Dupré