NPM Scraper — Downloads, Deps, Versions, CSV, No Token avatar

NPM Scraper — Downloads, Deps, Versions, CSV, No Token

Pricing

Pay per usage

Go to Apify Store
NPM Scraper — Downloads, Deps, Versions, CSV, No Token

NPM Scraper — Downloads, Deps, Versions, CSV, No Token

18 runs. NPM intel as CSV/JSON — name, version, license, repo, maintainers, deps, monthlyDownloads, lastPublished. Free npmjs.org API, no token. Backed by 951-run Trustpilot flagship + 31-actor portfolio. Supply-chain audits. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Alex

Alex

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

6 days ago

Last modified

Share

NPM Package Scraper — Search NPM, Pull Package Metadata + Monthly Downloads

Search the NPM registry by keyword, or pull metadata for a list of named packages — name, version, license, dependencies, maintainers, monthly downloads, and search-quality scores — using the official NPM registry endpoints. No API key required.

Two modes (run together or independently):

  • Search modesearchQueries: ["http server", "websocket"] walks registry.npmjs.org/-/v1/search and pushes one record per match.
  • Direct modepackageNames: ["express", "fastify"] pulls the full registry.npmjs.org/{name} document for each.

What you actually get (verified against src/main.js)

Search-mode record

{
"name": "express",
"description": "Fast, unopinionated, minimalist web framework",
"version": "5.0.1",
"keywords": ["web", "framework"],
"author": "TJ Holowaychuk",
"publisher": "wesleytodd",
"homepage": "https://expressjs.com/",
"repository": "https://github.com/expressjs/express",
"npm": "https://www.npmjs.com/package/express",
"lastPublished": "2026-03-15T08:11:00.000Z",
"score": 0.847,
"searchScore": 1234.5,
"quality": 0.91,
"popularity": 0.99,
"maintenance": 0.81,
"monthlyDownloads": 134000000,
"source": "search:http server",
"scrapedAt": "2026-04-29T12:00:00.000Z"
}

18 fields per search record when includeDownloads=true (17 without it). score, searchScore, quality, popularity, maintenance come straight from the NPM v1-search ranking system. monthlyDownloads is appended only if includeDownloads=true (default true).

Direct-mode record

{
"name": "express",
"description": "Fast, unopinionated, minimalist web framework",
"version": "5.0.1",
"license": "MIT",
"homepage": "https://expressjs.com/",
"repository": "https://github.com/expressjs/express",
"bugs": "https://github.com/expressjs/express/issues",
"author": "TJ Holowaychuk",
"maintainers": ["wesleytodd", "ulisesgascon"],
"keywords": ["web", "framework"],
"dependencies": ["accepts", "body-parser", "content-disposition"],
"dependenciesCount": 30,
"devDependencies": 12,
"engines": { "node": ">= 18" },
"createdAt": "2010-12-29T19:38:25.450Z",
"lastPublished": "2026-03-15T08:11:00.000Z",
"versionsCount": 285,
"readme": "<first 2000 characters of README>",
"monthlyDownloads": 134000000,
"source": "direct:express",
"scrapedAt": "2026-04-29T12:00:00.000Z"
}

21 fields per direct record when includeDownloads=true (20 without it). dependencies is an array of dependency NAMES (not a {name: version} map — earlier README versions claimed full version pinning; that's not extracted). devDependencies is the count, not the list. versionsCount is the count of all published versions, not a version-history array.


Honest disclosure on what's NOT extracted, and operational caveats

  • Single-attempt fetch, no retry. fetchJson issues one fetch() per URL and throws on any non-2xx. There is no exponential backoff, no 429 handling, and no transient-error retry. A single network blip or rate-limit response kills the in-flight request.
  • ⚠️ Outer try/catch wraps the ENTIRE search-loop AND the direct-mode loop (src/main.js lines 66-149). If query #2 of 10 throws (e.g. 429 from npm search API, transient TCP reset), queries #3-#10 AND every direct-mode package fetch after it are silently skipped. Direct-mode has its own inner try/catch (line 137-139) so a single bad package name does not kill its peers — but a search failure WILL skip the entire direct-mode block. Run search and direct in separate runs if isolation matters.
  • No proxy. fetch() is called without a proxy configuration. Heavy bursts may eventually trigger npm-registry rate limits with no fallback path.
  • monthlyDownloads may be null silently. getDownloads catches all errors and returns null. Reasons: the package has no download history (private/never-published), the downloads API returned a 404 (rare for valid names), or a transient network error. Null does NOT mean zero — treat it as missing data, not as a real "0 downloads/month".
  • Weekly and yearly downloads — only last-month is fetched (from api.npmjs.org/downloads/point/last-month). No weekly point and no yearly aggregation. Earlier README versions claimed all three; only monthly exists.
  • Dependency version pins — only dependency names. For full dependency trees, query registry.npmjs.org/{name}/{version} directly per package.
  • Version history with dates — only the count and the latest-published date.
  • Vulnerability data — none. Use npm audit / Snyk for security signals.

Input

ParameterTypeDefaultDescription
searchQueriesarray[]Keywords searched against registry.npmjs.org/-/v1/search.
packageNamesarray[]Specific package names to pull full metadata for.
maxPackagesPerQuerynumber50Per-query cap (search mode). Internal pagination uses size=min(remaining, 250) chunks. Max 250.
includeDownloadsbooleantrueIf true, fetch monthlyDownloads per package. Adds ~100 ms polite-delay + the actual HTTP latency per package.

Use cases

  • Market research — find the most-downloaded packages in a category by sorting search results on monthlyDownloads.
  • Competitive intel — track release cadence (lastPublished) and version churn (versionsCount) for a competitor's package.
  • Tech-stack reverse-engineering — feed a target's package.json dependency list as packageNames and pull full metadata.
  • Dependency auditing — surface package age (createdAt vs lastPublished gap) and maintainer churn.
  • DevRel signal — sort by NPM quality / popularity / maintenance scores to find packages worth covering.
  • Investment research — identify popular OSS projects + maintainer lists to surface acquisition / partnership candidates.

Python integration

from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("knotless_cadence/npm-package-scraper").call(
run_input={
"searchQueries": ["llm", "rag"],
"maxPackagesPerQuery": 25,
"includeDownloads": True,
}
)
packages = list(client.dataset(run["defaultDatasetId"]).iterate_items())
# Sort by monthly downloads, top 10 (treat null as 0 only for sorting purposes)
packages.sort(key=lambda p: p.get("monthlyDownloads") or 0, reverse=True)
for p in packages[:10]:
dl = p.get("monthlyDownloads")
dl_str = f"{dl:>12,d}" if dl is not None else " null"
print(f"{dl_str} {p['name']:30s} q={p.get('quality',0):.2f} pop={p.get('popularity',0):.2f}")

Common questions

Q: Is this legal? A: Yes. NPM registry data is publicly available and the actor uses the official documented endpoints (registry.npmjs.org/-/v1/search, registry.npmjs.org/{name}, api.npmjs.org/downloads/point/last-month).

Q: How many packages can I scrape? A: No hard limit. The internal request count is bounded by maxPackagesPerQuery × searchQueries.length + packageNames.length. With includeDownloads=true, expect ~100-300 ms per package because of the polite-delay + downloads call.

Q: Are the search-quality scores reliable? A: They come straight from NPM's v1-search ranking; we do not re-compute or modify. NPM weights popularity, quality, and maintenance internally and the final score is what their UI sorts on by default.

Q: Why does dependencies look like a list of names instead of a version map? A: That's a deliberate choice — we store Object.keys(latestVersion?.dependencies) for compactness. If you need version pins, run a follow-up GET registry.npmjs.org/{name}/{version} per package or request a custom build (see Custom scraping below).

Q: What happens if one of my search queries fails? A: The whole batch halts at that point. Search-mode queries #N+1 onward and all direct-mode packages are skipped silently. Direct-mode itself is isolated — one bad package name does not kill its peers — but a failed search call kills both. Run problematic search queries in isolation, or split your batches.

Q: Schedule recommendation? A: Daily for trend tracking; weekly is plenty for most use cases. NPM monthly downloads are a 30-day rolling window, so daily polling adds limited value.


Export integrations

  • CSV / JSON / Excel / HTML (native Apify dataset download)
  • Google Sheets (via Apify integration)
  • Webhooks on each run
  • S3 / GCS direct sync
  • Zapier / Make.com / n8n

SourceActorData
NPM (this)Package metadata + downloadsJS / TS ecosystem
GitHub Trending ScraperTrending reposOSS signal
GitHub Profile ScraperDeveloper profiles + reposRecruiting / OSS-strategy
Hacker News ScraperTech discussionTech discourse
Wayback Machine ScraperHistorical website snapshotsArchival lookups
Email Extractor ProEmails from websitesLead gen

All 31 published actors free to inspect on Apify Store.


Custom scraping — pilot tiers

Need a full dependency-version map, version-history time series, weekly + yearly download aggregates, retry-with-backoff on rate-limited search calls, or a vulnerability cross-reference? Three tiers:

  • Pilot — $97 · 1 actor, basic config, 7-day support. Good entry point — useful for a one-off "top 100 LLM packages by downloads" snapshot.
  • Standard — $297 · custom actor + Slack/email alerts on results, 30-day support. Most ecosystem-research projects fit here.
  • Premium — $797 · custom actor + dashboard + 90-day support + 1 modification round. For ongoing pipelines (weekly category-leaderboard refresh, dependency-audit rollups, multi-source enrichment with GitHub).

Email: spinov001@gmail.com — drop the keyword list or the package list and the schema you need; quote within 48h.

Proof of work: 31 published Apify scrapers (78 total in portfolio) — Trustpilot 949 runs, Reddit 80+, Google News 43, Glassdoor 37, Email Extractor 36+. Recently delivered a paid 3-article series for a client in the proxy industry ($150).

More tips: t.me/scraping_ai · blog.spinov.online


Disclaimer

Scrapes the publicly accessible NPM registry endpoints. Respects polite delays (300 ms between search pages, 100-200 ms between package fetches). Not affiliated with npm, Inc. or GitHub, Inc.

Honest disclosure: search records have 17 base fields (18 with monthlyDownloads), direct-mode records have 20 base fields (21 with monthlyDownloads). Only last-month downloads are fetched (no weekly, no yearly aggregate). monthlyDownloads can be null on transient errors — treat null as missing, not as zero. dependencies is an array of names — no version pins. versionsCount is an integer count — no version-history time series. README excerpt is capped at 2000 characters per package. Single-attempt fetch, no retry/no proxy. A failed search query halts all remaining search queries and direct-mode packages in the same run.