NVIDIA NGC Model Catalog Scraper avatar

NVIDIA NGC Model Catalog Scraper

Pricing

Pay per event

Go to Apify Store
NVIDIA NGC Model Catalog Scraper

NVIDIA NGC Model Catalog Scraper

Scrape 900+ GPU-optimized AI/ML models from the NVIDIA NGC catalog. Filter by keyword, application category, or framework. Returns model name, publisher, framework, precision, version, size, labels, and catalog URL.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Stas Persiianenko

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

What does it do?

The NVIDIA NGC Model Catalog Scraper extracts structured data from NVIDIA's NGC catalog — the official repository of GPU-optimized AI/ML models published by NVIDIA and its partners. With 900+ pre-trained models across every major AI domain, the NGC catalog is the definitive source for production-ready deep learning models optimized for NVIDIA hardware.

This actor fetches every model's key metadata: name, publisher, application category, ML framework, precision type, model format, version, file size, labels, description, and catalog URL — all delivered as clean structured JSON, ready to integrate with your pipelines.

No API key or authentication required. The actor calls NVIDIA's public REST API directly.

Who is it for?

🔬 AI/ML Researchers who need to audit the NGC catalog for models in their domain (NLP, computer vision, speech, healthcare) or track when new models are published.

🏗️ MLOps Engineers who want to automate model discovery, maintain an internal registry of available NVIDIA models, or set up scheduled monitoring for new additions to the catalog.

📊 Data Scientists building model comparison dashboards, benchmarking frameworks, or exploring what pre-trained models are available for their use case before committing to training from scratch.

🧑‍💼 Product Managers & Technical Writers at AI companies who need up-to-date competitor model intelligence or want to document which NVIDIA models are available for their product.

🤖 AI Automation Engineers who want to feed the NGC catalog into AI agents, RAG pipelines, or knowledge bases that need to reason about available GPU-optimized models.

Why use it?

The NVIDIA NGC catalog doesn't offer an export feature. You can browse models in the web UI one by one, but there's no CSV download, no bulk API explorer, and no way to filter the full catalog programmatically without writing your own API client.

This actor handles pagination (38+ pages), client-side filtering by keyword, category, and framework, and normalizes the raw API response into clean, flat JSON suitable for spreadsheets, databases, or downstream AI pipelines — in under a minute.

What data does it extract?

FieldDescriptionExample
nameModel slug identifierbertlargeuncased
displayNameHuman-readable model nameBERT Large Uncased
publisherPublisher organizationNVIDIA, Meta, MONAI
orgNameNGC organization namenvidia
teamNameTeam within the orgnemo, riva, tao
applicationApplication categorySpeech To Text, Classification
frameworkML frameworkPyTorch with NeMo, TensorRT
precisionModel precisionFP32, FP16, AMP, OTHER
modelFormatModel formatSavedModel, TLT, RIVA, Bundle
latestVersionLatest version string1.0.0, deployable_v2.0
latestVersionSizeBytesModel file size in bytes1248444838
latestVersionSizeMbModel file size in MB1190.61
labelsTags and keywords["NLP", "BERT", "PyTorch"]
shortDescriptionBrief model descriptionBERT Large Uncased trained on...
isPublicWhether the model is publictrue
canGuestDownloadWhether guests can downloadtrue
logoUrlLogo image URLhttps://...
builtByWho built the modelaiapps, NVIDIA
catalogUrlDirect link to model pagehttps://catalog.ngc.nvidia.com/...
createdDateModel creation date (ISO 8601)2021-03-10T03:31:51.797Z
updatedDateLast update date (ISO 8601)2024-11-12T17:56:32.338Z

How much does it cost to scrape the NVIDIA NGC catalog?

The actor uses pay-per-event pricing — you only pay for the models you actually extract. There's a small one-time start fee per run, plus a per-model charge.

Typical costs:

  • 20 models (single keyword search): ~$0.025
  • 100 models (one category): ~$0.11
  • Full catalog (~926 models): ~$0.94

All models are retrieved via NVIDIA's public REST API — no browser, no proxy required. Runs complete in seconds to a few minutes depending on result count.

Free plan estimate

New Apify accounts include free monthly compute credits. At typical pricing, you can scrape hundreds of NGC models per month within the free tier.

How to use this actor

Open the actor and fill in the Search keyword field (optional). For example, type bert to find all BERT-related models, or leave it blank to retrieve the full catalog.

Step 2: Apply category or framework filters (optional)

  • Application category: filter to a specific domain like Speech To Text, Classification, Object Detection, or Healthcare.
  • ML Framework: filter to a specific framework like PyTorch, NeMo, TensorRT, MONAI, or TAO Toolkit.

Both filters are case-insensitive substring matches.

Step 3: Set your result limit

Set Max results to the number of models you want. Use a large number (e.g. 10000) to retrieve all matching models without a cap.

Step 4: Run and download

Click Start and wait for the run to complete (usually under 60 seconds). Download results as JSON, CSV, or Excel from the Dataset tab.

Input parameters

ParameterTypeDefaultDescription
searchQueryString""Filter by keyword (searches name, display name, description)
applicationString""Filter by application category (e.g. Classification, Speech To Text)
frameworkString""Filter by ML framework (e.g. PyTorch, NeMo, TensorRT)
maxResultsInteger100Maximum number of models to return
maxRequestRetriesInteger3Retry attempts for failed API requests

Output example

{
"name": "bertlargeuncased",
"displayName": "Bertlargeuncased",
"publisher": "NVIDIA",
"orgName": "nvidia",
"teamName": "nemo",
"application": "OTHER",
"framework": "PyTorch with NeMo",
"precision": "AMP",
"modelFormat": "SavedModel",
"latestVersion": "1.0.0rc1",
"latestVersionSizeBytes": 1248444838,
"latestVersionSizeMb": 1190.61,
"labels": ["NLP", "Natural Language Processing", "BERT", "Bertlargeuncased"],
"shortDescription": "BERT Large Uncased trained on English Wikipedia and BookCorpus",
"isPublic": true,
"canGuestDownload": true,
"logoUrl": "https://assets.nvidiagrid.net/ngc/logos/Nemo.png",
"builtBy": "",
"catalogUrl": "https://catalog.ngc.nvidia.com/orgs/nvidia/models/bertlargeuncased",
"createdDate": "2021-03-10T03:31:51.797Z",
"updatedDate": "2023-04-04T19:23:11.786Z"
}

Tips & tricks

🔍 Combine filters for precision: Use searchQuery: "conformer" + framework: "NeMo" + application: "Speech" to narrow down to exactly the models you need.

📅 Monitor for new models: Schedule this actor to run weekly and compare the output against your previous snapshot. New models show up with a recent createdDate.

📊 Size-aware budgeting: Use latestVersionSizeMb to estimate download storage requirements before pulling models. A typical PyTorch model ranges from 50 MB to 10+ GB.

🏷️ Use labels for discovery: The labels field contains NVIDIA's own taxonomy. Search for "NSPECT" IDs to find models that have been inspected by NVIDIA's security team.

Fast runs with filters: Using keyword or category filters reduces both run time and cost since the actor stops paginating once it hits your maxResults limit.

Integrations

Export to Google Sheets for team collaboration

Run the actor → click Export to Google Sheets in the dataset view → share the sheet with your team. Ideal for ML teams maintaining a shared model registry.

Scheduled model monitoring with webhooks

Set up a weekly schedule → configure a webhook to POST results to Slack or email when the run completes. Your team gets notified when new NVIDIA models are available.

Feed into a RAG knowledge base

Use the Apify API to retrieve the dataset JSON → chunk model descriptions → embed with OpenAI → store in Pinecone or Weaviate. Your AI assistant can now answer "which NVIDIA NeMo models support speech synthesis in French?"

CI/CD model validation pipeline

Integrate with GitHub Actions: run the actor before deployment → verify your selected model ID still exists in the catalog → fail the pipeline if the model was deprecated.

API usage

Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('automation-lab/nvidia-ngc-scraper').call({
searchQuery: 'bert',
framework: 'PyTorch',
maxResults: 50,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Found ${items.length} NVIDIA NGC models`);

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("automation-lab/nvidia-ngc-scraper").call(run_input={
"searchQuery": "bert",
"framework": "PyTorch",
"maxResults": 50,
})
items = list(client.dataset(run["defaultDatasetId"]).iterate_items())
print(f"Found {len(items)} NVIDIA NGC models")

cURL

curl -X POST \
"https://api.apify.com/v2/acts/automation-lab~nvidia-ngc-scraper/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"searchQuery": "bert",
"framework": "PyTorch",
"maxResults": 50
}'

Using with AI assistants (MCP)

You can connect this actor to Claude, Cursor, VS Code, and other AI tools via the Apify MCP server. This lets your AI assistant query the NVIDIA NGC catalog on your behalf.

Claude Code (CLI)

$claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/nvidia-ngc-scraper"

Claude Desktop / Cursor / VS Code

Add to your MCP config file (claude_desktop_config.json or equivalent):

{
"mcpServers": {
"apify": {
"type": "http",
"url": "https://mcp.apify.com?tools=automation-lab/nvidia-ngc-scraper",
"headers": {
"Authorization": "Bearer YOUR_API_TOKEN"
}
}
}
}

Example prompts for your AI assistant

  • "Find all NVIDIA NGC models that use the NeMo framework for speech recognition"
  • "List all classification models in the NGC catalog updated after 2024"
  • "What NVIDIA models are available for object detection with FP16 precision?"
  • "Show me the 10 largest NGC models by file size"

Legality

This actor accesses NVIDIA's publicly available NGC catalog API (api.ngc.nvidia.com/v2/models). All data extracted is publicly accessible without authentication. Use of the NGC catalog data is subject to NVIDIA's Terms of Service. This actor is not affiliated with or endorsed by NVIDIA Corporation.

Always ensure your use of the extracted data complies with applicable terms of service and data usage policies.

FAQ

Q: Does this actor require an NVIDIA API key? A: No. The NGC model catalog's list endpoint is publicly accessible without any authentication. The actor fetches data using NVIDIA's public REST API.

Q: How many models are available in the NGC catalog? A: At time of writing, there are 926+ models. The catalog grows regularly as NVIDIA and partners publish new models. The actor fetches a live count from the API and paginates through all results.

Q: Can I filter by publisher (e.g., only Meta or MONAI models)? A: Currently, filtering is available by search keyword, application category, and ML framework. Publisher filtering can be applied by using a keyword that matches the publisher name (e.g., searchQuery: "meta" will find models published by Meta).

Q: The actor returned fewer results than expected. Why? A: If you applied filters, the result count reflects how many models matched your filters — not the total catalog size. Try broadening your filters or removing them to retrieve more results. Also check that maxResults is set high enough.

Q: I'm getting errors on some pages. What should I do? A: The actor automatically retries failed requests (default: 3 retries with backoff). If errors persist, try increasing maxRequestRetries to 5. Transient errors from the NVIDIA API are usually self-resolving within seconds.

Explore more AI/ML data scrapers from automation-lab: