Website Logo Product Image Banner Extractor avatar
Website Logo Product Image Banner Extractor

Pricing

$9.99/month + usage

Go to Apify Store
Website Logo Product Image Banner Extractor

Website Logo Product Image Banner Extractor

Extract Logos & brand marks (including favicons), Product images and catalog thumbnails, Hero / banner images (headers, mastheads), Team photos, avatars, profile pictures, Social media graphics (Open Graph, Twitter cards), Icon sets (SVG, PNG, touch icons)

Pricing

$9.99/month + usage

Rating

0.0

(0)

Developer

BotFlowTech

BotFlowTech

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

8 days ago

Last modified

Share

🎨 Enhanced Visual Asset Extraction Suite

Extract all visual assets from any website with intelligent categorization and rich metadata, built as a Python Apify Actor for e‑commerce, design, and automation workflows.


🚀 What this actor does

This actor crawls a list of website URLs and extracts visual assets such as:

  • 🏢 Logos & brand marks (including favicons)
  • 🛍️ Product images and catalog thumbnails
  • 🎯 Hero / banner images (headers, mastheads)
  • 👥 Team photos, avatars, profile pictures
  • 📱 Social media graphics (Open Graph, Twitter cards)
  • 🎨 Icon sets (SVG, PNG, touch icons)
  • 🖼️ Gallery / slider / carousel images

For each asset, it attempts to infer:

  • Category (logo, product, hero, team, social, icon, gallery, other)
  • Source (HTML tag / location where it was found)
  • Format (SVG/PNG/JPG/WebP/AVIF/ICO/…)
  • Optional dimensions (width, height) when enabled

🧠 How it works

The actor:

  1. Downloads the HTML of each URL.
  2. Parses the page and extracts images from:
    • <img> tags (including lazy‑loading attributes like data-src)
    • <meta property="og:image"> and name="twitter:image"
    • <link rel="icon">, Apple touch icons, and default /favicon.ico
    • <picture> elements and srcset attributes
    • Inline CSS background-image: url(...) styles
  3. Converts relative URLs to absolute.
  4. Applies a rule‑based classifier that looks at:
    • Image URL
    • alt text
    • CSS classes
    • Local DOM context
      to categorize each asset.
  5. Optionally fetches images and uses Pillow to read true width, height, and format.
  6. De‑duplicates assets by URL and outputs a structured JSON object per input URL.

⚙️ Input

The actor accepts a JSON object with the following fields:

{
"urls": [
"https://www.apple.com",
"https://www.nike.com"
],
"extractDimensions": true,
"fetchDimensions": false,
"maxConcurrency": 5
}
Input fields
urls (array of strings, required)
List of website URLs to extract visual assets from.
extractDimensions (boolean, default: true)
If true, detects image format (SVG/PNG/WebP/etc.) from the URL and enriches output with this information.
fetchDimensions (boolean, default: false)
If true, downloads each image (under 5 MB) and uses Pillow to read actual width and height.
This is slower and uses more resources, but yields precise dimensions.
maxConcurrency (integer, default: 5, min: 1, max: 20)
Maximum number of URLs processed concurrently.
{
"url": "https://example.com",
"totalAssets": 47,
"categoryBreakdown": {
"logo": 3,
"product": 24,
"hero": 2,
"team": 8,
"icon": 6,
"social": 2,
"other": 2
},
"assets": [
{
"src": "https://example.com/images/logo.svg",
"category": "logo",
"format": "svg",
"source": "img_tag",
"alt": "Company Logo",
"class": "site-logo",
"width": "200",
"height": "60",
"loading": "lazy"
},
{
"src": "https://example.com/images/hero-banner.webp",
"category": "hero",
"format": "webp",
"source": "css_background",
"class": "homepage-hero",
"width": 1920,
"height": 1080
}
]
}