NPM Scraper — Downloads, Deps, Versions, CSV, No Token
Pricing
Pay per usage
NPM Scraper — Downloads, Deps, Versions, CSV, No Token
18 runs. NPM intel as CSV/JSON — name, version, license, repo, maintainers, deps, monthlyDownloads, lastPublished. Free npmjs.org API, no token. Backed by 951-run Trustpilot flagship + 31-actor portfolio. Supply-chain audits. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Alex
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
6 days ago
Last modified
Categories
Share
NPM Package Scraper — Search NPM, Pull Package Metadata + Monthly Downloads
Search the NPM registry by keyword, or pull metadata for a list of named packages — name, version, license, dependencies, maintainers, monthly downloads, and search-quality scores — using the official NPM registry endpoints. No API key required.
Two modes (run together or independently):
- Search mode —
searchQueries: ["http server", "websocket"]walksregistry.npmjs.org/-/v1/searchand pushes one record per match. - Direct mode —
packageNames: ["express", "fastify"]pulls the fullregistry.npmjs.org/{name}document for each.
What you actually get (verified against src/main.js)
Search-mode record
{"name": "express","description": "Fast, unopinionated, minimalist web framework","version": "5.0.1","keywords": ["web", "framework"],"author": "TJ Holowaychuk","publisher": "wesleytodd","homepage": "https://expressjs.com/","repository": "https://github.com/expressjs/express","npm": "https://www.npmjs.com/package/express","lastPublished": "2026-03-15T08:11:00.000Z","score": 0.847,"searchScore": 1234.5,"quality": 0.91,"popularity": 0.99,"maintenance": 0.81,"monthlyDownloads": 134000000,"source": "search:http server","scrapedAt": "2026-04-29T12:00:00.000Z"}
18 fields per search record when includeDownloads=true (17 without it). score, searchScore, quality, popularity, maintenance come straight from the NPM v1-search ranking system. monthlyDownloads is appended only if includeDownloads=true (default true).
Direct-mode record
{"name": "express","description": "Fast, unopinionated, minimalist web framework","version": "5.0.1","license": "MIT","homepage": "https://expressjs.com/","repository": "https://github.com/expressjs/express","bugs": "https://github.com/expressjs/express/issues","author": "TJ Holowaychuk","maintainers": ["wesleytodd", "ulisesgascon"],"keywords": ["web", "framework"],"dependencies": ["accepts", "body-parser", "content-disposition"],"dependenciesCount": 30,"devDependencies": 12,"engines": { "node": ">= 18" },"createdAt": "2010-12-29T19:38:25.450Z","lastPublished": "2026-03-15T08:11:00.000Z","versionsCount": 285,"readme": "<first 2000 characters of README>","monthlyDownloads": 134000000,"source": "direct:express","scrapedAt": "2026-04-29T12:00:00.000Z"}
21 fields per direct record when includeDownloads=true (20 without it). dependencies is an array of dependency NAMES (not a {name: version} map — earlier README versions claimed full version pinning; that's not extracted). devDependencies is the count, not the list. versionsCount is the count of all published versions, not a version-history array.
Honest disclosure on what's NOT extracted, and operational caveats
- Single-attempt fetch, no retry.
fetchJsonissues onefetch()per URL and throws on any non-2xx. There is no exponential backoff, no 429 handling, and no transient-error retry. A single network blip or rate-limit response kills the in-flight request. - ⚠️ Outer try/catch wraps the ENTIRE search-loop AND the direct-mode loop (
src/main.jslines 66-149). If query #2 of 10 throws (e.g. 429 from npm search API, transient TCP reset), queries #3-#10 AND every direct-mode package fetch after it are silently skipped. Direct-mode has its own inner try/catch (line 137-139) so a single bad package name does not kill its peers — but a search failure WILL skip the entire direct-mode block. Run search and direct in separate runs if isolation matters. - No proxy.
fetch()is called without a proxy configuration. Heavy bursts may eventually trigger npm-registry rate limits with no fallback path. monthlyDownloadsmay be null silently.getDownloadscatches all errors and returnsnull. Reasons: the package has no download history (private/never-published), the downloads API returned a 404 (rare for valid names), or a transient network error. Null does NOT mean zero — treat it as missing data, not as a real "0 downloads/month".- Weekly and yearly downloads — only
last-monthis fetched (fromapi.npmjs.org/downloads/point/last-month). No weekly point and no yearly aggregation. Earlier README versions claimed all three; only monthly exists. - Dependency version pins — only dependency names. For full dependency trees, query
registry.npmjs.org/{name}/{version}directly per package. - Version history with dates — only the count and the latest-published date.
- Vulnerability data — none. Use
npm audit/ Snyk for security signals.
Input
| Parameter | Type | Default | Description |
|---|---|---|---|
searchQueries | array | [] | Keywords searched against registry.npmjs.org/-/v1/search. |
packageNames | array | [] | Specific package names to pull full metadata for. |
maxPackagesPerQuery | number | 50 | Per-query cap (search mode). Internal pagination uses size=min(remaining, 250) chunks. Max 250. |
includeDownloads | boolean | true | If true, fetch monthlyDownloads per package. Adds ~100 ms polite-delay + the actual HTTP latency per package. |
Use cases
- Market research — find the most-downloaded packages in a category by sorting search results on
monthlyDownloads. - Competitive intel — track release cadence (
lastPublished) and version churn (versionsCount) for a competitor's package. - Tech-stack reverse-engineering — feed a target's
package.jsondependency list aspackageNamesand pull full metadata. - Dependency auditing — surface package age (
createdAtvslastPublishedgap) and maintainer churn. - DevRel signal — sort by NPM
quality/popularity/maintenancescores to find packages worth covering. - Investment research — identify popular OSS projects + maintainer lists to surface acquisition / partnership candidates.
Python integration
from apify_client import ApifyClientclient = ApifyClient("YOUR_APIFY_TOKEN")run = client.actor("knotless_cadence/npm-package-scraper").call(run_input={"searchQueries": ["llm", "rag"],"maxPackagesPerQuery": 25,"includeDownloads": True,})packages = list(client.dataset(run["defaultDatasetId"]).iterate_items())# Sort by monthly downloads, top 10 (treat null as 0 only for sorting purposes)packages.sort(key=lambda p: p.get("monthlyDownloads") or 0, reverse=True)for p in packages[:10]:dl = p.get("monthlyDownloads")dl_str = f"{dl:>12,d}" if dl is not None else " null"print(f"{dl_str} {p['name']:30s} q={p.get('quality',0):.2f} pop={p.get('popularity',0):.2f}")
Common questions
Q: Is this legal?
A: Yes. NPM registry data is publicly available and the actor uses the official documented endpoints (registry.npmjs.org/-/v1/search, registry.npmjs.org/{name}, api.npmjs.org/downloads/point/last-month).
Q: How many packages can I scrape?
A: No hard limit. The internal request count is bounded by maxPackagesPerQuery × searchQueries.length + packageNames.length. With includeDownloads=true, expect ~100-300 ms per package because of the polite-delay + downloads call.
Q: Are the search-quality scores reliable?
A: They come straight from NPM's v1-search ranking; we do not re-compute or modify. NPM weights popularity, quality, and maintenance internally and the final score is what their UI sorts on by default.
Q: Why does dependencies look like a list of names instead of a version map?
A: That's a deliberate choice — we store Object.keys(latestVersion?.dependencies) for compactness. If you need version pins, run a follow-up GET registry.npmjs.org/{name}/{version} per package or request a custom build (see Custom scraping below).
Q: What happens if one of my search queries fails? A: The whole batch halts at that point. Search-mode queries #N+1 onward and all direct-mode packages are skipped silently. Direct-mode itself is isolated — one bad package name does not kill its peers — but a failed search call kills both. Run problematic search queries in isolation, or split your batches.
Q: Schedule recommendation?
A: Daily for trend tracking; weekly is plenty for most use cases. NPM monthly downloads are a 30-day rolling window, so daily polling adds limited value.
Export integrations
- CSV / JSON / Excel / HTML (native Apify dataset download)
- Google Sheets (via Apify integration)
- Webhooks on each run
- S3 / GCS direct sync
- Zapier / Make.com / n8n
Related scrapers
| Source | Actor | Data |
|---|---|---|
| NPM (this) | Package metadata + downloads | JS / TS ecosystem |
| GitHub Trending Scraper | Trending repos | OSS signal |
| GitHub Profile Scraper | Developer profiles + repos | Recruiting / OSS-strategy |
| Hacker News Scraper | Tech discussion | Tech discourse |
| Wayback Machine Scraper | Historical website snapshots | Archival lookups |
| Email Extractor Pro | Emails from websites | Lead gen |
All 31 published actors free to inspect on Apify Store.
Custom scraping — pilot tiers
Need a full dependency-version map, version-history time series, weekly + yearly download aggregates, retry-with-backoff on rate-limited search calls, or a vulnerability cross-reference? Three tiers:
- Pilot — $97 · 1 actor, basic config, 7-day support. Good entry point — useful for a one-off "top 100 LLM packages by downloads" snapshot.
- Standard — $297 · custom actor + Slack/email alerts on results, 30-day support. Most ecosystem-research projects fit here.
- Premium — $797 · custom actor + dashboard + 90-day support + 1 modification round. For ongoing pipelines (weekly category-leaderboard refresh, dependency-audit rollups, multi-source enrichment with GitHub).
Email: spinov001@gmail.com — drop the keyword list or the package list and the schema you need; quote within 48h.
Proof of work: 31 published Apify scrapers (78 total in portfolio) — Trustpilot 949 runs, Reddit 80+, Google News 43, Glassdoor 37, Email Extractor 36+. Recently delivered a paid 3-article series for a client in the proxy industry ($150).
More tips: t.me/scraping_ai · blog.spinov.online
Disclaimer
Scrapes the publicly accessible NPM registry endpoints. Respects polite delays (300 ms between search pages, 100-200 ms between package fetches). Not affiliated with npm, Inc. or GitHub, Inc.
Honest disclosure: search records have 17 base fields (18 with monthlyDownloads), direct-mode records have 20 base fields (21 with monthlyDownloads). Only last-month downloads are fetched (no weekly, no yearly aggregate). monthlyDownloads can be null on transient errors — treat null as missing, not as zero. dependencies is an array of names — no version pins. versionsCount is an integer count — no version-history time series. README excerpt is capped at 2000 characters per package. Single-attempt fetch, no retry/no proxy. A failed search query halts all remaining search queries and direct-mode packages in the same run.