Pricing

Pay per usage

NPM Scraper — Downloads, Deps, Versions, CSV, No Token

35 runs · 100% ok past 30d (29/29). NPM intel as CSV/JSON — name, version, license, repo, maintainers, deps, monthlyDownloads, lastPublished. Free npmjs.org API, no token. Trustpilot 968r flagship + 32-actor portfolio. dev.to/0012303 · blog.spinov.online · t.me/scraping_ai

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Alex

Actor stats

Bookmarked

Total users

Monthly active users

11 days ago

Last modified

NPM Package Scraper — Search NPM, Pull Package Metadata + Monthly Downloads

Search the NPM registry by keyword, or pull metadata for a list of named packages — name, version, license, dependencies, maintainers, monthly downloads, and search-quality scores — using the official NPM registry endpoints. No API key required.

Two modes (run together or independently):

Search mode — searchQueries: ["http server", "websocket"] walks registry.npmjs.org/-/v1/search and pushes one record per match.
Direct mode — packageNames: ["express", "fastify"] pulls the full registry.npmjs.org/{name} document for each.

What you actually get (verified against `src/main.js`)

Search-mode record

{
  "name": "express",
  "description": "Fast, unopinionated, minimalist web framework",
  "version": "5.0.1",
  "keywords": ["web", "framework"],
  "author": "TJ Holowaychuk",
  "publisher": "wesleytodd",
  "homepage": "https://expressjs.com/",
  "repository": "https://github.com/expressjs/express",
  "npm": "https://www.npmjs.com/package/express",
  "lastPublished": "2026-03-15T08:11:00.000Z",
  "score": 0.847,
  "searchScore": 1234.5,
  "quality": 0.91,
  "popularity": 0.99,
  "maintenance": 0.81,
  "monthlyDownloads": 134000000,
  "source": "search:http server",
  "scrapedAt": "2026-04-29T12:00:00.000Z"
}

18 fields per search record when includeDownloads=true (17 without it). score, searchScore, quality, popularity, maintenance come straight from the NPM v1-search ranking system. monthlyDownloads is appended only if includeDownloads=true (default true).

Direct-mode record

{
  "name": "express",
  "description": "Fast, unopinionated, minimalist web framework",
  "version": "5.0.1",
  "license": "MIT",
  "homepage": "https://expressjs.com/",
  "repository": "https://github.com/expressjs/express",
  "bugs": "https://github.com/expressjs/express/issues",
  "author": "TJ Holowaychuk",
  "maintainers": ["wesleytodd", "ulisesgascon"],
  "keywords": ["web", "framework"],
  "dependencies": ["accepts", "body-parser", "content-disposition"],
  "dependenciesCount": 30,
  "devDependencies": 12,
  "engines": { "node": ">= 18" },
  "createdAt": "2010-12-29T19:38:25.450Z",
  "lastPublished": "2026-03-15T08:11:00.000Z",
  "versionsCount": 285,
  "readme": "<first 2000 characters of README>",
  "monthlyDownloads": 134000000,
  "source": "direct:express",
  "scrapedAt": "2026-04-29T12:00:00.000Z"
}

21 fields per direct record when includeDownloads=true (20 without it). dependencies is an array of dependency NAMES (not a {name: version} map — earlier README versions claimed full version pinning; that's not extracted). devDependencies is the count, not the list. versionsCount is the count of all published versions, not a version-history array.

Honest disclosure on what's NOT extracted, and operational caveats

Single-attempt fetch, no retry. fetchJson issues one fetch() per URL and throws on any non-2xx. There is no exponential backoff, no 429 handling, and no transient-error retry. A single network blip or rate-limit response kills the in-flight request.
⚠️ Outer try/catch wraps the ENTIRE search-loop AND the direct-mode loop (src/main.js lines 66-149). If query #2 of 10 throws (e.g. 429 from npm search API, transient TCP reset), queries #3-#10 AND every direct-mode package fetch after it are silently skipped. Direct-mode has its own inner try/catch (line 137-139) so a single bad package name does not kill its peers — but a search failure WILL skip the entire direct-mode block. Run search and direct in separate runs if isolation matters.
No proxy. fetch() is called without a proxy configuration. Heavy bursts may eventually trigger npm-registry rate limits with no fallback path.
monthlyDownloads may be null silently. getDownloads catches all errors and returns null. Reasons: the package has no download history (private/never-published), the downloads API returned a 404 (rare for valid names), or a transient network error. Null does NOT mean zero — treat it as missing data, not as a real "0 downloads/month".
Weekly and yearly downloads — only last-month is fetched (from api.npmjs.org/downloads/point/last-month). No weekly point and no yearly aggregation. Earlier README versions claimed all three; only monthly exists.
Dependency version pins — only dependency names. For full dependency trees, query registry.npmjs.org/{name}/{version} directly per package.
Version history with dates — only the count and the latest-published date.
Vulnerability data — none. Use npm audit / Snyk for security signals.

Input

Parameter	Type	Default	Description
`searchQueries`	array	`[]`	Keywords searched against `registry.npmjs.org/-/v1/search`.
`packageNames`	array	`[]`	Specific package names to pull full metadata for.
`maxPackagesPerQuery`	number	`50`	Per-query cap (search mode). Internal pagination uses `size=min(remaining, 250)` chunks. Max 250.
`includeDownloads`	boolean	`true`	If true, fetch `monthlyDownloads` per package. Adds ~100 ms polite-delay + the actual HTTP latency per package.

Use cases

Market research — find the most-downloaded packages in a category by sorting search results on monthlyDownloads.
Competitive intel — track release cadence (lastPublished) and version churn (versionsCount) for a competitor's package.
Tech-stack reverse-engineering — feed a target's package.json dependency list as packageNames and pull full metadata.
Dependency auditing — surface package age (createdAt vs lastPublished gap) and maintainer churn.
DevRel signal — sort by NPM quality / popularity / maintenance scores to find packages worth covering.
Investment research — identify popular OSS projects + maintainer lists to surface acquisition / partnership candidates.

Python integration

from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_TOKEN")

run = client.actor("knotless_cadence/npm-package-scraper").call(
    run_input={
        "searchQueries": ["llm", "rag"],
        "maxPackagesPerQuery": 25,
        "includeDownloads": True,
    }
)
packages = list(client.dataset(run["defaultDatasetId"]).iterate_items())

# Sort by monthly downloads, top 10 (treat null as 0 only for sorting purposes)
packages.sort(key=lambda p: p.get("monthlyDownloads") or 0, reverse=True)
for p in packages[:10]:
    dl = p.get("monthlyDownloads")
    dl_str = f"{dl:>12,d}" if dl is not None else "         null"
    print(f"{dl_str}  {p['name']:30s}  q={p.get('quality',0):.2f}  pop={p.get('popularity',0):.2f}")

Common questions

Q: Is this legal? A: Yes. NPM registry data is publicly available and the actor uses the official documented endpoints (registry.npmjs.org/-/v1/search, registry.npmjs.org/{name}, api.npmjs.org/downloads/point/last-month).

Q: How many packages can I scrape? A: No hard limit. The internal request count is bounded by maxPackagesPerQuery × searchQueries.length + packageNames.length. With includeDownloads=true, expect ~100-300 ms per package because of the polite-delay + downloads call.

Q: Are the search-quality scores reliable? A: They come straight from NPM's v1-search ranking; we do not re-compute or modify. NPM weights popularity, quality, and maintenance internally and the final score is what their UI sorts on by default.

Q: Why does dependencies look like a list of names instead of a version map? A: That's a deliberate choice — we store Object.keys(latestVersion?.dependencies) for compactness. If you need version pins, run a follow-up GET registry.npmjs.org/{name}/{version} per package or request a custom build (see Custom scraping below).

Q: What happens if one of my search queries fails? A: The whole batch halts at that point. Search-mode queries #N+1 onward and all direct-mode packages are skipped silently. Direct-mode itself is isolated — one bad package name does not kill its peers — but a failed search call kills both. Run problematic search queries in isolation, or split your batches.

Q: Schedule recommendation? A: Daily for trend tracking; weekly is plenty for most use cases. NPM monthly downloads are a 30-day rolling window, so daily polling adds limited value.

Export integrations

CSV / JSON / Excel / HTML (native Apify dataset download)
Google Sheets (via Apify integration)
Webhooks on each run
S3 / GCS direct sync
Zapier / Make.com / n8n

Source	Actor	Data
NPM (this)	Package metadata + downloads	JS / TS ecosystem
GitHub Trending Scraper	Trending repos	OSS signal
GitHub Profile Scraper	Developer profiles + repos	Recruiting / OSS-strategy
Hacker News Scraper	Tech discussion	Tech discourse
Wayback Machine Scraper	Historical website snapshots	Archival lookups
Email Extractor Pro	Emails from websites	Lead gen

All 31 published actors free to inspect on Apify Store.

Custom scraping — pilot tiers

Need a full dependency-version map, version-history time series, weekly + yearly download aggregates, retry-with-backoff on rate-limited search calls, or a vulnerability cross-reference? Three tiers:

Pilot — $97 · 1 actor, basic config, 7-day support. Good entry point — useful for a one-off "top 100 LLM packages by downloads" snapshot.
Standard — $297 · custom actor + Slack/email alerts on results, 30-day support. Most ecosystem-research projects fit here.
Premium — $797 · custom actor + dashboard + 90-day support + 1 modification round. For ongoing pipelines (weekly category-leaderboard refresh, dependency-audit rollups, multi-source enrichment with GitHub).

Email: spinov001@gmail.com — drop the keyword list or the package list and the schema you need; quote within 48h.

Proof of work: 31 published Apify scrapers (78 total in portfolio) — Trustpilot 949 runs, Reddit 80+, Google News 43, Glassdoor 37, Email Extractor 36+. Recently delivered a paid 3-article series for a client in the proxy industry ($150).

More tips: t.me/scraping_ai · blog.spinov.online

Disclaimer

Scrapes the publicly accessible NPM registry endpoints. Respects polite delays (300 ms between search pages, 100-200 ms between package fetches). Not affiliated with npm, Inc. or GitHub, Inc.

Honest disclosure: search records have 17 base fields (18 with monthlyDownloads), direct-mode records have 20 base fields (21 with monthlyDownloads). Only last-month downloads are fetched (no weekly, no yearly aggregate). monthlyDownloads can be null on transient errors — treat null as missing, not as zero. dependencies is an array of names — no version pins. versionsCount is an integer count — no version-history time series. README excerpt is capped at 2000 characters per package. Single-attempt fetch, no retry/no proxy. A failed search query halts all remaining search queries and direct-mode packages in the same run.

Exchange Rate Scraper — 30+ Fiat FX, Historical Deltas, CSV

knotless_cadence/exchange-rate-scraper

34 runs · 100% ok past 30d (30/30). Daily FX rates CSV/JSON — 30+ currencies via ECB/Frankfurter. Historical delta + % change. No API key. Treasury / cross-border pricing / close. Trustpilot 968r + 32-actor portfolio (2190 lifetime). dev.to/0012303 · blog.spinov.online · t.me/scraping_ai

Alex

NPM Package Scraper — Downloads, Maintainers, Deps & SBOM

logiover/npm-package-intelligence-scraper

Export every NPM package by keyword, maintainer, scope or name. Get version, license, repo URL, maintainers, daily/weekly/monthly downloads, dependents, deprecation, full deps, version history. Official NPM registry + stats API. For devtool intel, SBOM and OSS outreach.

Logiover

GitHub Profile — Repos, Stars, Activity, CSV, No Token, Bulk

knotless_cadence/github-profile-scraper

21 runs. GitHub user intel in CSV/JSON — repos, stars, followers, contribs, languages, bio, email. No API token, no rate blocks. Backed by 951-run Trustpilot flagship + 31-actor portfolio. For recruiter outreach + talent mapping. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Alex

MCP Trend Detector — Market Trend Signals, JSON, No API Key

knotless_cadence/mcp-trend-detector

Trending topics across Reddit/HN/Google News in real time. MCP-native for Claude/ChatGPT agents. Backed by 971-run Trustpilot flagship · 32 public actors · 79-actor portfolio · paid work live: dev.to/0012303. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Alex

GitHub Trending — CSV Stars, Topics by Period, No Token

knotless_cadence/github-trending-scraper

20 runs. GitHub Trending repos in CSV/JSON — owner, name, url, language, stars, topics. Daily/weekly/monthly + lang filter, no token. Backed by 951-run Trustpilot flagship + 31-actor portfolio. For OSS scouting + VC dealflow. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Alex

NPM Package Scraper

glassventures/npm-package-scraper

Scrape NPM package data. Extract name, version, downloads, dependencies, maintainers, and more. Export to JSON, CSV, Excel.

Glass Ventures

Walmart Reviews Scraper — Product Reviews to CSV/JSON in 2 min

knotless_cadence/walmart-reviews-scraper

25 runs / u7d=1 fresh signal. Backed by 971-run Trustpilot flagship + 32-actor portfolio (2190 lifetime runs). Walmart reviews → CSV/JSON. Bypasses 100-review UI cap. 17 fields: stars, text, author, date, helpful, images. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Alex

Meta Threads Scraper — CSV, No Login, No Rate Limits

knotless_cadence/threads-scraper

Meta Threads (threads.net) JSON/CSV — POSTs (author, text, source) + PROFILEs (followers, bio, avatar) by username/search. 46 runs / 8 users / 32-actor portfolio (2190 lifetime). Audience research + brand mentions. Sample: dev.to/0012303. spinov001@gmail.com · blog.spinov.online · t.me/scraping_ai

Alex

NPM Package Search and Stats Scraper

money_machine_agent/npm-package-search-scraper

Search NPM by query. Returns name, version, description, keywords, license, repo, maintainers, weekly/monthly/yearly downloads, npms.io quality scores. Free APIs.

Shane Miller

YouTube Comments — CSV, Likes, Replies, isVerified, No API Key

knotless_cadence/youtube-comments-scraper

72 runs. YouTube comments → CSV/JSON (text, author, likes, replies, isVerified). Innertube backend, no API quotas. From 972-run Trustpilot flagship · 32-actor portfolio. LLM training + brand listening. dev.to/0012303 · blog.spinov.online · spinov001@gmail.com

Alex