Docker Hub Scraper avatar

Docker Hub Scraper

Pricing

Pay per usage

Go to Apify Store
Docker Hub Scraper

Docker Hub Scraper

Scrape Docker Hub repositories, container images & metadata efficiently. Essential for market research, competitive analysis, developer tool insights, registry monitoring & API integrations.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Shahid Irfan

Shahid Irfan

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

7 days ago

Last modified

Share

Extract Docker Hub repository results for any keyword and build clean datasets for monitoring, research, and analysis. Collect repository names, popularity metrics, descriptions, timestamps, and direct Docker Hub URLs in one run.


Features

  • Keyword-based collection - Search Docker Hub repositories by any keyword.
  • URL-based start option - Start directly from a Docker Hub search URL.
  • Automatic pagination - Continues through result pages until your target count is reached.
  • Optional deep metadata - Collect extended repository information for richer analysis.
  • Clean dataset output - Null and empty values are removed from every dataset item.

Use Cases

Registry Monitoring

Track how repository popularity changes over time by collecting pull counts, stars, and update timestamps. This helps teams spot trending images and monitor ecosystem shifts.

Competitive Research

Compare repository descriptions, stars, and pull counts across related projects. Use this for competitor benchmarking and positioning analysis.

Developer Tool Discovery

Find relevant images for specific stacks, tooling categories, or workflows. Build curated lists based on objective metadata.

Data Pipelines

Create repeatable exports for dashboards, reports, and automated workflows. The structured output is ready for BI tools and downstream processing.


Input Parameters

ParameterTypeRequiredDefaultDescription
keywordStringYes"apify"Search query used for Docker Hub repositories.
startUrlStringNo"https://hub.docker.com/search?q=apify"Optional Docker Hub search URL. If it contains q=..., that query is used.
collectDetailsBooleanNotrueInclude extra repository metadata such as long description and timestamps.
onlyOfficialBooleanNofalseSave only repositories marked as official.
results_wantedIntegerNo20Maximum number of repositories to collect.
max_pagesIntegerNo10Maximum number of result pages to request.
proxyConfigurationObjectNoApify ProxyProxy configuration for run stability.

Output Data

Each dataset item can include the following fields:

FieldTypeDescription
search_queryStringQuery used for this result set.
rankIntegerResult position in collected order.
repo_nameStringFull repository identifier in namespace/name format.
namespaceStringRepository namespace or organization.
nameStringRepository image name.
short_descriptionStringShort summary from search results.
descriptionStringExtended repository description when available.
pull_countIntegerTotal pulls.
star_countIntegerTotal stars.
is_officialBooleanWhether Docker Hub marks the repository as official.
is_automatedBooleanWhether automated builds are enabled.
repository_typeStringRepository type when available.
statusIntegerRepository status code when available.
date_registeredStringRepository registration timestamp.
last_updatedStringLast update timestamp.
last_modifiedStringLast metadata modification timestamp.
urlStringDirect Docker Hub repository URL.

Usage Examples

{
"keyword": "apify",
"results_wanted": 20
}

Start From a Search URL

{
"startUrl": "https://hub.docker.com/search?q=apify",
"collectDetails": true,
"results_wanted": 30,
"max_pages": 10
}

Official Images Only

{
"keyword": "node",
"onlyOfficial": true,
"collectDetails": true,
"results_wanted": 25
}

Sample Output

{
"search_query": "apify",
"rank": 1,
"repo_name": "apify/actor-node",
"namespace": "apify",
"name": "actor-node",
"short_description": "Alpine + Node.js for running the Apify Client or SDK without headless browsers",
"description": "Alpine + Node.js for running the Apify Client or SDK without headless browsers",
"pull_count": 2876082,
"star_count": 3,
"is_official": false,
"is_automated": false,
"repository_type": "image",
"status": 1,
"date_registered": "2019-09-24T11:22:58.123456Z",
"last_updated": "2026-03-10T09:10:28.011803Z",
"last_modified": "2026-03-10T09:10:28.011803Z",
"url": "https://hub.docker.com/r/apify/actor-node"
}

Tips For Best Results

Start Small, Then Scale

Use results_wanted: 20 for quick validation runs. Increase gradually for larger production collections.

Use Focused Keywords

Specific queries usually return cleaner datasets than broad terms. Try product names, framework names, or vendor terms.

Enable Detail Collection For Richer Records

Set collectDetails to true when you need extended metadata for analysis or reporting.

Use Proxy Configuration For High-Volume Runs

For repeated or large runs, configure proxy settings to improve resilience.


Integrations

  • Google Sheets - Send collected results to spreadsheet workflows.
  • Airtable - Build searchable repository catalogs.
  • Make - Automate multi-step processing pipelines.
  • Zapier - Trigger notifications and app workflows.
  • Webhooks - Forward output to custom APIs and services.

Export Formats

  • JSON - API and developer workflows
  • CSV - Spreadsheet analysis
  • Excel - Business reporting
  • XML - Legacy integrations

Frequently Asked Questions

How many repositories can I collect?

You can collect as many as available for a query, controlled by results_wanted and max_pages.

Can I run this with just a URL?

Yes. Provide startUrl containing q=... and the actor will use that query.

Why do some records have fewer fields?

Some repositories do not expose every field. Empty values are removed from output, so each item contains only available data.

Can I keep only official images?

Yes. Set onlyOfficial to true.

Is this suitable for scheduled monitoring?

Yes. You can schedule recurring runs and compare output over time.


Support

For issues or feature requests, use Apify Console support channels.

Resources


This actor is intended for legitimate data collection and analysis workflows. Users are responsible for complying with website terms and applicable laws in their jurisdiction.