Docker Hub Scraper | Container Image Metadata avatar

Docker Hub Scraper | Container Image Metadata

Pricing

from $19.00 / 1,000 results

Go to Apify Store
Docker Hub Scraper | Container Image Metadata

Docker Hub Scraper | Container Image Metadata

Scrape Docker Hub repositories for image names, descriptions, pull counts, star ratings, tags, last updated dates and publisher details. Track container popularity, monitor official images and build datasets of the Docker ecosystem for DevOps research and tooling

Pricing

from $19.00 / 1,000 results

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

8 days ago

Last modified

Share

ParseForge Banner

🐳 Docker Hub Scraper

🚀 Export Docker Hub repositories and images in seconds. Search by keyword or list every repo in a namespace. Get pull counts, star counts, descriptions, and categories. No API key needed.

🕒 Last updated: 2026-05-22 · 📊 13 fields per record · 🐳 1,000,000+ public images · 🌐 Two scraping modes · ⚡ Real-time data

The Docker Hub Scraper pulls live repository data from Docker Hub's public REST API and returns structured records for every image matched by your search or namespace. Each record includes the repository name, namespace, description, pull count, star count, official status, privacy flag, last-updated timestamp, categories, and a direct URL to the image page.

Docker Hub is the world's largest container image registry, hosting over 100,000 public repositories from official maintainers, software vendors, and the open-source community. This Actor makes that data downloadable as CSV, Excel, JSON, or XML in under a minute - no Docker account, no API key, no scraping boilerplate.

🎯 Target Audience💡 Primary Use Cases
DevOps engineers, platform teams, security researchers, data analysts, OSS maintainers, container consultantsDependency audits, image popularity tracking, namespace monitoring, competitive research, container supply-chain analysis

📋 What the Docker Hub Scraper does

Two data-collection modes in a single run:

  • 🔍 Search mode. Full-text search across all public Docker Hub repositories. Find images by name, technology, or purpose - returns pull counts and star counts sorted by relevance.
  • 📦 Namespace mode. List every public repository owned by a user or organization (e.g. bitnami, library, nginx). Returns richer data including last-updated timestamps and category tags.
  • 📊 Pagination. Automatically walks all result pages until your maxItems limit is reached.
  • Real-time. Every run fetches live data directly from the Docker Hub v2 API - no cached snapshots.
  • 🔓 No auth required. The public API is used throughout - no Docker Hub account or API key needed.

💡 Why it matters: Docker Hub has no bulk-export feature. Auditing image popularity, tracking namespace growth, or feeding a container-intelligence dashboard requires either writing your own paginator or scraping by hand. This Actor handles all of that and returns clean, structured JSON on every run.


🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded dataset.


⚙️ Input

InputTypeDefaultBehavior
maxItemsinteger10Records to return. Free plan caps at 10, paid plan at 1,000,000.
searchQuerystring"nginx"Full-text search query. Used when Namespace is not set. Defaults to "nginx" if both fields are empty.
namespacestring-Docker Hub username or org (e.g. library, bitnami, nginx). Takes priority over Search Query when provided.

Example 1 - Search for Python images:

{
"searchQuery": "python",
"maxItems": 50
}

Example 2 - List all Bitnami repositories:

{
"namespace": "bitnami",
"maxItems": 100
}

⚠️ Good to Know: If neither searchQuery nor namespace is provided, the Actor defaults to searching "nginx". Search mode and namespace mode return slightly different fields - lastUpdated and categories are only populated in namespace mode, since the Docker Hub search API does not return those fields.


📊 Output

Each record contains the following fields:

FieldTypeDescription
📛 namestringRepository name (without namespace prefix)
🏢 namespacestringOwner username or organization
🔗 fullNamestringCombined namespace/name identifier
📝 descriptionstringShort repository description
📥 pullCountnumberTotal number of image pulls
starCountnumberNumber of stars
isOfficialbooleanTrue if this is an official Docker image
🔒 isPrivatebooleanTrue if the repository is private
🕒 lastUpdatedstringISO 8601 timestamp of last push (namespace mode only)
🏷️ categoriesarrayCategory tags (namespace mode only, e.g. "Databases & storage")
🌐 urlstringDirect URL to the Docker Hub image page
📅 scrapedAtstringISO 8601 timestamp of when the record was collected
errorstringError message if the record could not be fetched

Sample records - search mode (searchQuery: "nginx", 3 of 5 shown):

[
{
"name": "nginx",
"namespace": "library",
"fullName": "nginx",
"description": "Official build of Nginx.",
"pullCount": 13022750244,
"starCount": 21279,
"isOfficial": true,
"isPrivate": false,
"lastUpdated": null,
"categories": null,
"url": "https://hub.docker.com/_/nginx",
"scrapedAt": "2026-05-22T00:15:18.335Z",
"error": null
},
{
"name": "nginx-ingress",
"namespace": "nginx",
"fullName": "nginx/nginx-ingress",
"description": "NGINX and NGINX Plus Ingress Controllers for Kubernetes",
"pullCount": 1085333058,
"starCount": 120,
"isOfficial": false,
"isPrivate": false,
"lastUpdated": null,
"categories": null,
"url": "https://hub.docker.com/r/nginx/nginx-ingress",
"scrapedAt": "2026-05-22T00:15:18.335Z",
"error": null
},
{
"name": "nginx-prometheus-exporter",
"namespace": "nginx",
"fullName": "nginx/nginx-prometheus-exporter",
"description": "NGINX Prometheus Exporter for NGINX and NGINX Plus",
"pullCount": 87347846,
"starCount": 51,
"isOfficial": false,
"isPrivate": false,
"lastUpdated": null,
"categories": null,
"url": "https://hub.docker.com/r/nginx/nginx-prometheus-exporter",
"scrapedAt": "2026-05-22T00:15:18.335Z",
"error": null
}
]

Sample records - namespace mode (namespace: "bitnami", 2 of 5 shown):

[
{
"name": "redis",
"namespace": "bitnami",
"fullName": "bitnami/redis",
"description": "Bitnami Secure Image for redis",
"pullCount": 3339283423,
"starCount": 365,
"isOfficial": false,
"isPrivate": false,
"lastUpdated": "2026-05-11T17:18:16.783773Z",
"categories": ["Databases & storage", "Message queues", "Monitoring & observability"],
"url": "https://hub.docker.com/r/bitnami/redis",
"scrapedAt": "2026-05-22T00:09:00.964Z",
"error": null
},
{
"name": "nginx",
"namespace": "bitnami",
"fullName": "bitnami/nginx",
"description": "Bitnami Secure Image for nginx",
"pullCount": 435962728,
"starCount": 205,
"isOfficial": false,
"isPrivate": false,
"lastUpdated": "2026-05-20T12:43:53.736954Z",
"categories": ["API management", "Security", "Web servers"],
"url": "https://hub.docker.com/r/bitnami/nginx",
"scrapedAt": "2026-05-22T00:09:00.964Z",
"error": null
}
]

✨ Why choose this Actor

FeatureDetails
🔓 No auth requiredPublic Docker Hub API - no Docker account or token needed
⚡ FastA page of 25 results returns in under 1 second
📦 Two modesSearch by keyword or enumerate a full namespace
🏷️ CategoriesNamespace mode returns category tags not visible in search
📥 Pull countsExact pull counts for every image, updated live
📄 Multi-format exportCSV, Excel, JSON, XML - all from Apify's dataset UI
🔁 PaginationWalks all pages automatically up to your maxItems cap
💰 Pay-per-resultOnly charged for real data records, never for errors

📈 How it compares to alternatives

MethodEffortPaginationExport formatsReal-time
This ActorZero codeAutomaticCSV, Excel, JSON, XMLYes
Docker Hub UIManual copyNo bulk exportNoneYes
Docker Hub CLIRequires Docker + authManualTerminal output onlyYes
Custom scriptHours of developmentManual codingCustomYes

🚀 How to use

  1. Create a free account on Apify (includes $5 credit).
  2. Open the Docker Hub Scraper actor page.
  3. Set searchQuery (e.g. "postgres") or namespace (e.g. "bitnami").
  4. Set maxItems to the number of records you need.
  5. Click Start and wait for the run to finish (typically under 30 seconds for 100 items).
  6. Download results as CSV, Excel, JSON, or XML from the Dataset tab.

💼 Business use cases

Container security and compliance

Security teams use pull-count and official-status data to build an approved image registry. Filter by isOfficial: true to enforce policies that only allow vetted base images. Compare pull counts across candidate images to gauge community adoption before approving a new dependency.

Competitive intelligence for container vendors

ISVs distributing software via Docker Hub track their own namespace pull growth and compare it against competing namespaces. Running the scraper weekly against a list of competitor namespaces produces a time-series that reveals release cadence, adoption velocity, and community momentum.

Open-source ecosystem research

Researchers studying container adoption patterns use this Actor to build datasets of image popularity over time. Filtering by category tags (namespace mode) enables segmented analysis - e.g. comparing pull growth in "Databases & storage" vs. "Security" images across a given quarter.

DevOps dependency auditing

Platform teams maintaining internal Kubernetes clusters need to know the last-updated timestamps for every third-party image in their stack. The namespace mode returns lastUpdated for all repos in an org, making it trivial to flag images not updated in 90+ days and escalate them for replacement.


🔌 Automating Docker Hub Scraper

Connect this Actor to automation platforms for hands-free container intelligence:

  • Make (Integromat) - Schedule weekly namespace scans and push results to a Google Sheet or Airtable database.
  • Zapier - Trigger a Zap when a run completes and send a summary to Slack or email.
  • Slack - Post pull-count milestones or image-staleness alerts directly to your DevOps channel.
  • Google Sheets - Use the Apify Google Sheets integration to append each run's results to a tracking spreadsheet.
  • Webhooks - Fire a webhook on run completion to trigger downstream CI/CD pipelines or data warehouse ingestion.

🌟 Beyond business use cases

Academic research

Container ecosystem studies, software-supply-chain papers, and open-source adoption research all benefit from structured Docker Hub data. Export the full dataset for a namespace or technology category and analyze it in R, Python, or Excel.

OSS project maintenance

Open-source maintainers track their image's pull count over time to measure project adoption, correlate releases with pull spikes, and include accurate download statistics in project READMEs and grant applications.

Creative and community projects

DevRel teams, conference speakers, and Docker community evangelists use pull-count leaderboards to tell the story of container adoption. Build visual dashboards, blog posts, or infographics showing which images are powering the world's infrastructure.

Experimentation and prototyping

Students and hobbyists exploring container technologies can quickly enumerate what's available in the library namespace to discover official images across every major technology stack - databases, runtimes, web servers, message queues, and more.


🤖 Ask an AI assistant about this scraper

Not sure which mode to use or how to process the output? Paste this prompt into any AI assistant:

"I'm using the Docker Hub Scraper on Apify (parseforge/dockerhub-scraper). It has two modes: search mode using a searchQuery string, and namespace mode using a namespace like 'bitnami'. Help me [describe your goal]."


❓ Frequently Asked Questions

🔑 Do I need a Docker Hub account or API key?

No. The Actor uses the public Docker Hub v2 REST API which requires no authentication for public repository data.

Search mode queries the Docker Hub full-text search index. It matches against image names, descriptions, and namespace names. Results are sorted by relevance, with official images typically ranking first.

📦 What is namespace mode?

Namespace mode lists every public repository owned by a specific Docker Hub user or organization. For example, namespace: "bitnami" returns all 276+ repositories in the Bitnami organization.

🕒 Why is lastUpdated null in search mode?

The Docker Hub search API (/v2/search/repositories/) does not return last_updated timestamps. Use namespace mode (/v2/repositories/{namespace}/) if you need timestamps.

🏷️ Why are categories null in search mode?

Category tags are only available in the namespace API response. They are not returned by the search endpoint.

🔢 How many items can I scrape?

Free plan users are limited to 10 items per run. Paid plan users can scrape up to 1,000,000 items per run. Docker Hub search returns up to 285,000+ results for broad queries; namespace endpoints return all public repos for an organization.

⚡ How fast is it?

The Actor fetches 25 results per API request. A run collecting 100 items completes in under 5 seconds on Apify's infrastructure.

💰 How am I charged?

The Actor uses pay-per-result pricing. You are only charged for records that are successfully scraped and pushed to the dataset. Error records are never charged.

🔄 Can I run this on a schedule?

Yes. Use Apify's built-in scheduler to run the Actor daily, weekly, or on any cron schedule. Results accumulate in your dataset over time.

🛡️ Is this against Docker Hub's terms of service?

This Actor only accesses Docker Hub's publicly documented REST API, which is designed for programmatic access to public repository data. No authentication is bypassed, no private data is accessed, and all requests respect the API's documented structure.

📊 Can I export to Excel?

Yes. Every Apify dataset can be downloaded as CSV, Excel (XLSX), JSON, XML, or RSS from the dataset UI or via the Apify API.

🔗 Can I get the full image details page data?

The current Actor collects all fields available from the search and repository listing endpoints. Detailed image metadata (tags, architecture manifests, vulnerability scan results) is available from separate Docker Hub endpoints and may be added in a future version.


🔌 Integrate with any app

Apify datasets connect natively with hundreds of platforms:

IntegrationHow
Google SheetsApify Google Sheets Actor or Zapier
AirtableZapier or Make webhook
SlackApify notifications or Make scenario
Make (Integromat)HTTP module watching the dataset
ZapierApify trigger on run completion
Power BIJSON dataset endpoint as a data source
TableauCSV download or JSON connector
Python / pandasApify client (apify-client) or direct API
Node.jsapify-client npm package
REST APIGET /v2/datasets/{datasetId}/items

ActorWhat it does
PyPI ScraperSearch Python packages on PyPI and extract download stats, metadata, and classifiers
npm Registry ScraperSearch npm packages and extract weekly downloads, license, repository, and maintainer data
GitHub ScraperScrape dev.to articles, tags, and author profiles for developer content research

💡 Pro Tip: browse the complete ParseForge collection to find scrapers for 80+ public data sources - all maintained, all pay-per-result.


This Actor accesses only publicly available Docker Hub data via the official REST API. It is not affiliated with, endorsed by, or connected to Docker, Inc. Docker and Docker Hub are trademarks of Docker, Inc. Use responsibly and in accordance with Docker Hub's terms of service.