Docker Hub Scraper avatar

Docker Hub Scraper

Pricing

from $3.00 / 1,000 results

Go to Apify Store
Docker Hub Scraper

Docker Hub Scraper

Scrape Docker Hub, container image search, pull counts, star counts, publisher and verified-publisher data, tags, architectures, OS support, categories, and user/org profiles. Pure HTTP, no auth required

Pricing

from $3.00 / 1,000 results

Rating

5.0

(7)

Developer

Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

7

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

Scrape Docker Hub — the world's largest container image registry. Search images, get pull counts, star counts, publisher info, architectures, OS support, categories, tags, and user/org profiles. Pure HTTP via the public hub.docker.com REST API. No auth, no cookies, no proxy required.

What this actor does

  • Six modes: search, byImages, userProfile, repoTags, topByCategory, namespaceRepos
  • Real-time pull and star counts for every public image
  • Architecture / OS support — amd64, arm64, arm, 386, ppc64le, s390x, riscv64, mips64le; Linux & Windows
  • Categories — 16 official Docker Hub categories (Databases, Web Servers, Languages, Operating Systems, etc.)
  • Publisher metadata — verified publishers, Docker Official Images, sponsored open-source
  • Tags — per-tag size, digest, push date, platform manifest list
  • Profiles — user and organization profile data including verified-publisher badges
  • Rich filtering — pull range, star range, official-only, verified-only, architecture, OS, keyword
  • Empty fields are omitted

Output per image

  • namespace, name, fullName — e.g. library/postgres
  • shortDescription, description
  • typeimage / plugin
  • isOfficial, isVerifiedPublisher, isAutomated, isArchived, isPrivate
  • pullCount (numeric), pullCountDisplay (e.g. 1B+)
  • starCount
  • lastUpdated, lastPulled, lastModified, dateRegistered
  • statusactive / archived
  • publisher{ name, id, isOfficial?, isVerified? }
  • categories[] — e.g. Databases & storage, Web servers
  • architectures[] — e.g. amd64, arm64, s390x
  • operatingSystems[]linux, windows
  • mediaTypes[], contentTypes[]
  • logoUrl — CDN-hosted publisher logo
  • repoUrl — canonical hub.docker.com URL
  • sourcestore (official), verified_publisher, etc.
  • storageSize (bytes, where available)
  • recordType: "image", scrapedAt

Output per tag (mode=repoTags)

  • namespace, name, fullName, tagName
  • fullSize, lastPushed, lastPulled
  • lastUpdaterUsername, digest
  • architecture, os — primary platform
  • platforms[]{ architecture, os, variant?, size, digest, status } per manifest
  • mediaType, contentType
  • repoUrl, baseRepoUrl
  • recordType: "tag", scrapedAt

Output per user/org (mode=userProfile)

  • username, fullName, type (User / Organization)
  • company, location, profileUrl, dateJoined
  • avatarUrl (Gravatar)
  • badgeverified_publisher / official / open_source
  • isVerifiedPublisher, isOfficial, isActive
  • dockerHubUrl
  • recordType: "userProfile", scrapedAt

Input

FieldTypeDefaultDescription
modeenumsearchsearch / byImages / userProfile / repoTags / topByCategory / namespaceRepos
searchQuerystringpostgresFree-text query (mode=search)
imageNamesarraynamespace/repo strings (mode=byImages)
namespacestringUser / org slug (mode=userProfile, repoTags, namespaceRepos)
repositorystringRepository name (mode=repoTags)
categoryenumCategory slug (mode=topByCategory or as a filter)
architecturesarray enumFilter to images supporting any selected arch
operatingSystemsarray enumFilter to images supporting any selected OS
isOfficialboolfalseOnly Docker Official Images
isVerifiedPublisherboolfalseOnly Verified Publishers
minStarCountintDrop images with fewer stars
maxStarCountintDrop images with more stars
minPullCountintDrop images with fewer pulls
maxPullCountintDrop images with more pulls
sortByenumrelevancepull_count / star_count / updated_at / name
containsKeywordstringSubstring filter on description/name (case-insensitive)
includeUserReposbooltrueAlso enumerate a user/org's repos in userProfile mode
maxItemsint50Hard cap (1–1000)

Example: search PostgreSQL images, official only

{
"mode": "search",
"searchQuery": "postgres",
"isOfficial": true,
"sortBy": "pull_count",
"maxItems": 20
}

Example: lookup a list of specific images

{
"mode": "byImages",
"imageNames": [
"library/nginx",
"bitnami/redis",
"library/postgres",
"https://hub.docker.com/r/jenkins/jenkins"
]
}

Example: get all tags for a repository

{
"mode": "repoTags",
"namespace": "library",
"repository": "postgres",
"maxItems": 100
}

Example: top databases by pull count

{
"mode": "topByCategory",
"category": "databases-and-storage",
"sortBy": "pull_count",
"maxItems": 30
}

Example: user / organization profile + their repos

{
"mode": "userProfile",
"namespace": "bitnami",
"includeUserRepos": true,
"maxItems": 50
}

Example: ARM64-only images for IoT deployments

{
"mode": "search",
"searchQuery": "alpine",
"architectures": ["arm64"],
"minStarCount": 100
}

Use cases

  • DevOps intelligence — discover production-ready images for your stack
  • Security scanning — bulk-export verified publisher images for compliance review
  • Container marketplaces — feed Docker Hub categories and metadata into your catalog
  • Migration planning — find ARM64 / RISC-V replacements for amd64-only images
  • Open-source analytics — track pull counts and stars to gauge ecosystem trends
  • Competitive analysis — benchmark image popularity across alternative publishers
  • Compliance — verify image provenance, last-updated dates, and publisher status
  • Build pipelines — enumerate tag manifests for reproducible base-image pinning

FAQ

Do I need a Docker Hub account to use this actor? No. Docker Hub's public REST API does not require authentication for read access to public images, users, and orgs.

What's the difference between pullCount and pullCountDisplay? Docker Hub returns pull counts as a display string (1B+, 500M+) for popular images. The actor parses this into a numeric pullCount for sorting/filtering while keeping the display value in pullCountDisplay.

What are Docker Official Images? Images in the library/ namespace, curated and maintained by Docker Inc. They appear as library/postgres, library/nginx, etc. and are accessible via the https://hub.docker.com/_/postgres short URL.

What's a Verified Publisher? A company or open-source project that has been verified by Docker Inc. as the official source of an image (e.g. bitnami, jenkins, hashicorp). Verified images carry a verified_publisher badge.

Why are some architectures arrays missing entries like riscv64? Only architectures actually built for that image are listed. Most images target amd64 + arm64; only a few publishers (e.g. library/* official images) build for the full set.

Can I get all tags for a repository? Yes, use mode: "repoTags". The actor paginates through all tags up to maxItems.

What does source: "store" mean? Indicates the image is sourced from the Docker Official Images "store" (i.e. library/* namespace). Other sources include verified_publisher, community, and open_source.

How fresh is the data? Real-time — every request hits Docker Hub directly. Pull counts update continuously; tag push dates are accurate to the second.

Why doesn't repoTags mode return a recordType: "image" record for the parent repo? By design — repoTags emits one record per tag for fine-grained processing. Combine with byImages if you need parent-repo metadata.

Is there a rate limit? Docker Hub's public read endpoints have generous limits. The actor inserts small polite delays between requests and retries with exponential backoff on 429 / 5xx.

Data Source

This actor uses Docker Hub's public REST API at https://hub.docker.com:

  • /api/search/v3/catalog/search — full-text image search with filters
  • /v2/repositories/{namespace}/{repo}/ — repository details
  • /v2/repositories/{namespace}/{repo}/tags/ — tag manifests
  • /v2/repositories/{namespace}/ — all repos in a namespace
  • /v2/users/{username}/ and /v2/orgs/{org}/ — user and organization profiles
  • /v2/categories — official category taxonomy
  • /v2/search/repositories/ — legacy search endpoint (fallback)

No authentication is required for any of these endpoints when reading public data.