Docker Hub Scraper
Pricing
Pay per usage
Docker Hub Scraper
Scrape Docker Hub repositories, container images & metadata efficiently. Essential for market research, competitive analysis, developer tool insights, registry monitoring & API integrations.
Pricing
Pay per usage
Rating
0.0
(0)
Developer

Shahid Irfan
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
7 days ago
Last modified
Categories
Share
Extract Docker Hub repository results for any keyword and build clean datasets for monitoring, research, and analysis. Collect repository names, popularity metrics, descriptions, timestamps, and direct Docker Hub URLs in one run.
Features
- Keyword-based collection - Search Docker Hub repositories by any keyword.
- URL-based start option - Start directly from a Docker Hub search URL.
- Automatic pagination - Continues through result pages until your target count is reached.
- Optional deep metadata - Collect extended repository information for richer analysis.
- Clean dataset output - Null and empty values are removed from every dataset item.
Use Cases
Registry Monitoring
Track how repository popularity changes over time by collecting pull counts, stars, and update timestamps. This helps teams spot trending images and monitor ecosystem shifts.
Competitive Research
Compare repository descriptions, stars, and pull counts across related projects. Use this for competitor benchmarking and positioning analysis.
Developer Tool Discovery
Find relevant images for specific stacks, tooling categories, or workflows. Build curated lists based on objective metadata.
Data Pipelines
Create repeatable exports for dashboards, reports, and automated workflows. The structured output is ready for BI tools and downstream processing.
Input Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
keyword | String | Yes | "apify" | Search query used for Docker Hub repositories. |
startUrl | String | No | "https://hub.docker.com/search?q=apify" | Optional Docker Hub search URL. If it contains q=..., that query is used. |
collectDetails | Boolean | No | true | Include extra repository metadata such as long description and timestamps. |
onlyOfficial | Boolean | No | false | Save only repositories marked as official. |
results_wanted | Integer | No | 20 | Maximum number of repositories to collect. |
max_pages | Integer | No | 10 | Maximum number of result pages to request. |
proxyConfiguration | Object | No | Apify Proxy | Proxy configuration for run stability. |
Output Data
Each dataset item can include the following fields:
| Field | Type | Description |
|---|---|---|
search_query | String | Query used for this result set. |
rank | Integer | Result position in collected order. |
repo_name | String | Full repository identifier in namespace/name format. |
namespace | String | Repository namespace or organization. |
name | String | Repository image name. |
short_description | String | Short summary from search results. |
description | String | Extended repository description when available. |
pull_count | Integer | Total pulls. |
star_count | Integer | Total stars. |
is_official | Boolean | Whether Docker Hub marks the repository as official. |
is_automated | Boolean | Whether automated builds are enabled. |
repository_type | String | Repository type when available. |
status | Integer | Repository status code when available. |
date_registered | String | Repository registration timestamp. |
last_updated | String | Last update timestamp. |
last_modified | String | Last metadata modification timestamp. |
url | String | Direct Docker Hub repository URL. |
Usage Examples
Basic Keyword Search
{"keyword": "apify","results_wanted": 20}
Start From a Search URL
{"startUrl": "https://hub.docker.com/search?q=apify","collectDetails": true,"results_wanted": 30,"max_pages": 10}
Official Images Only
{"keyword": "node","onlyOfficial": true,"collectDetails": true,"results_wanted": 25}
Sample Output
{"search_query": "apify","rank": 1,"repo_name": "apify/actor-node","namespace": "apify","name": "actor-node","short_description": "Alpine + Node.js for running the Apify Client or SDK without headless browsers","description": "Alpine + Node.js for running the Apify Client or SDK without headless browsers","pull_count": 2876082,"star_count": 3,"is_official": false,"is_automated": false,"repository_type": "image","status": 1,"date_registered": "2019-09-24T11:22:58.123456Z","last_updated": "2026-03-10T09:10:28.011803Z","last_modified": "2026-03-10T09:10:28.011803Z","url": "https://hub.docker.com/r/apify/actor-node"}
Tips For Best Results
Start Small, Then Scale
Use results_wanted: 20 for quick validation runs. Increase gradually for larger production collections.
Use Focused Keywords
Specific queries usually return cleaner datasets than broad terms. Try product names, framework names, or vendor terms.
Enable Detail Collection For Richer Records
Set collectDetails to true when you need extended metadata for analysis or reporting.
Use Proxy Configuration For High-Volume Runs
For repeated or large runs, configure proxy settings to improve resilience.
Integrations
- Google Sheets - Send collected results to spreadsheet workflows.
- Airtable - Build searchable repository catalogs.
- Make - Automate multi-step processing pipelines.
- Zapier - Trigger notifications and app workflows.
- Webhooks - Forward output to custom APIs and services.
Export Formats
- JSON - API and developer workflows
- CSV - Spreadsheet analysis
- Excel - Business reporting
- XML - Legacy integrations
Frequently Asked Questions
How many repositories can I collect?
You can collect as many as available for a query, controlled by results_wanted and max_pages.
Can I run this with just a URL?
Yes. Provide startUrl containing q=... and the actor will use that query.
Why do some records have fewer fields?
Some repositories do not expose every field. Empty values are removed from output, so each item contains only available data.
Can I keep only official images?
Yes. Set onlyOfficial to true.
Is this suitable for scheduled monitoring?
Yes. You can schedule recurring runs and compare output over time.
Support
For issues or feature requests, use Apify Console support channels.
Resources
Legal Notice
This actor is intended for legitimate data collection and analysis workflows. Users are responsible for complying with website terms and applicable laws in their jurisdiction.