NPM Package Scraper — npm metadata api
Pricing
Pay per event
NPM Package Scraper — npm metadata api
Pull rich metadata for any NPM package via the npm registry API — current version, dependencies, weekly downloads, repo URL, license, keywords, README excerpt, deprecation flag — export to JSON or CSV. Free npm registry + downloads API, no key required.
Pricing
Pay per event
Rating
0.0
(0)
Developer
DevilScrapes
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
20 hours ago
Last modified
Categories
Share
🎯 What this scrapes
The NPM registry exposes a JSON endpoint at registry.npmjs.org/<package> for every published package, plus a separate download-count API at api.npmjs.org/downloads/point/last-week/<pkg>. This Actor accepts a list of package names (including scoped packages like @apify/sdk), fans them out in parallel, merges both API responses, and writes one clean structured row per package. Think of it as a production-grade npm metadata api wrapper — no auth tokens, no manual pagination, no stitching two endpoints together yourself.
🔥 Features
What the Actor does:
- Parallel bulk lookup — configurable concurrency (default 8) across the registry + downloads APIs; process hundreds of packages in a single run.
- Scoped package support —
@scope/nameform works natively, no URL-encoding gymnastics required. - Weekly download counts — optional merge with the NPM downloads API (one extra request per package, toggleable).
- Deprecation detection — surfaces the full deprecation message when a version is marked deprecated, so your audit pipeline catches it automatically.
- Full dependency maps —
dependencies,devDependencies, andpeerDependenciesas structured objects, not raw strings. - Structured, validated output — Pydantic-validated rows with ISO-8601 timestamps and stable field names; export as JSON, CSV, or Excel from Apify Console in one click.
What we handle for you:
- 🛡️ Browser fingerprint rotation —
curl-cffireplays real Chrome / Firefox / Safari TLS handshakes so every request looks like a genuine browser, not a Python script. - 🌐 Residential proxy rotation via Apify Proxy — fresh session ID and exit IP on every block so you never burn a single IP.
- 🔁 Retries with exponential backoff on
408 / 429 / 5xx— up to 5 attempts per package,Retry-Afterrespected. - 🧱 Rate-limit-aware pacing — we slow down when the target pushes back instead of getting the run banned.
- 🧊 Clean, typed dataset rows — Pydantic-validated, ISO-8601 timestamps, stable field names, JSON / CSV / Excel export straight from Apify Console.
- 💰 Pay-Per-Event pricing — you pay only for results that land in your dataset. No data, no charge.
💡 Use cases
- Dependency audit — score every package in your
package.jsonfor weekly downloads, license compliance, and deprecation status before merging. - Vendor benchmarking — compare competing libraries side-by-side on download trends, maintenance activity, and known issues.
- Supply-chain monitoring — feed the output into Socket, Snyk, or a custom risk dashboard to catch newly-deprecated dependencies before they ship.
- SDK download leaderboards — track weekly download momentum for your own packages and competitors over time.
- AI / RAG knowledge graphs — seed an LLM index with structured npm metadata api responses for package-aware code assistants.
- Hiring intel — find org members listed as maintainers on widely-used packages to inform developer outreach.
⚙️ How to use it
- Click Try for free at the top of the Store page.
- Paste your package list into the
packagesfield — one name per line or as a JSON array, scoped packages included. - Toggle
includeDownloadson if you want last-week download counts (adds one extra request per package). - Click Start. Output streams into the run's dataset in real time.
- Export from Storage → Dataset as JSON, CSV, or Excel — or pull via the Apify API with your token.
📥 Input
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
packages | array | yes | ["express", "react", "@apify/sdk"] | Package names to look up. Scoped packages (@scope/name) are supported. |
includeDownloads | boolean | no | true | Merges last-week download count from the NPM downloads API. One extra request per package. |
concurrency | integer | no | 8 | Parallel requests to the registry. Raise carefully — aggressive concurrency can trigger rate-limiting. |
proxyConfiguration | object | no | {"useApifyProxy": true} | Proxy settings. We recommend leaving this on Apify Proxy — it keeps session state consistent across retries. |
Example input
{"packages": ["express","react","@apify/sdk"],"includeDownloads": true,"concurrency": 8,"proxyConfiguration": {"useApifyProxy": true}}
📤 Output
One dataset row per package.
| Field | Type | Notes |
|---|---|---|
name | string | Package name, including @scope if present. |
version | string | Current latest dist-tag version. |
description | string | null | Package description string. |
homepage | string | null | Project homepage URL. |
license | string | null | SPDX license identifier (e.g. MIT, Apache-2.0). |
author | string | null | Author display string. |
maintainers | array | List of maintainer login names. |
keywords | array | Tags from package.json. |
repository_url | string | null | Source repo URL (typically git+https://github.com/...). |
bugs_url | string | null | Bug-tracker URL if defined. |
dist_tarball | string | null | Tarball download URL for the latest version. |
engines | object | null | Node/npm version constraints from the engines field. |
dependencies | object | null | Runtime dependency map (name → version range). |
dev_dependencies | object | null | Dev dependency map. |
peer_dependencies | object | null | Peer dependency map. |
deprecated | string | null | Full deprecation message when the version is deprecated; null otherwise. |
weekly_downloads | integer | null | Downloads in the last 7 days (populated when includeDownloads is true). |
package_url | string | Canonical npmjs.com package page URL. |
published_at | string | null | ISO-8601 publish timestamp for the latest version. |
scraped_at | string | ISO-8601 timestamp when this row was recorded. |
Example output
{"name": "express","version": "4.21.2","description": "Fast, unopinionated, minimalist web framework","license": "MIT","maintainers": ["wesleytodd", "dougwilson"],"keywords": ["express", "framework", "sinatra", "web", "rest", "restful", "router", "app"],"repository_url": "git+https://github.com/expressjs/express.git","dependencies": {"accepts": "~1.3.8","array-flatten": "1.1.1"},"deprecated": null,"weekly_downloads": 31000000,"package_url": "https://www.npmjs.com/package/express","published_at": "2024-03-25T14:09:03.000Z","scraped_at": "2026-06-01T10:00:00.000Z"}
💰 Pricing
Pay-Per-Event — you pay only when these events fire:
| Event | Cost (USD) | What it covers |
|---|---|---|
actor-start | $0.005 | One-off warm-up charge per run |
result | $0.0015 | Per dataset row written |
Example: 1 000 packages at the rates above ≈ $1.50. No subscription, no minimum, no card to start — every new Apify account gets $5 of free credit.
Compared to calling the npm metadata api yourself: no rate-limit management, no session handling, no merging two separate API endpoints, no retry logic. We charge $1.50 per thousand; you save the engineering hours.
🚧 Limitations
latestdist-tag only — per-version lookup (e.g.express@4.18.0) is not yet supported.- Pre-release tags excluded —
alpha,beta,nextdist-tags are not resolved. - Tarball content not extracted — we return the tarball URL; we do not download or unpack it.
- Downloads API cap — the NPM downloads API hard-caps at 128 packages per bulk call; we fan out automatically, but very large batches will take proportionally longer.
- Private packages — scoped private packages (
@org/internal-pkg) return a 404 from the public registry and are skipped with a log warning.
❓ FAQ
What exactly is the npm metadata api?
The NPM registry exposes https://registry.npmjs.org/<package> as a JSON document with full package metadata — versions, maintainers, dependencies, dist tarballs, and more. A separate endpoint at https://api.npmjs.org/downloads/point/last-week/<pkg> returns download counts. This Actor calls both, merges the responses, and delivers clean structured rows. You get a production-grade npm metadata api pipeline without maintaining the plumbing yourself.
Is this an npm registry scraper or an API wrapper?
Both. The registry exposes clean JSON, but stitching two API hosts, handling scoped package names, managing retries, and staying inside rate limits is real engineering work. We do all of that and return validated rows. Think of it as an npm registry scraper that handles the messy bits for you.
Are download counts exact?
NPM's downloads API is widely understood to include a ~5% noise band due to bot-traffic filtering on their end. Trust the trend and relative magnitude; don't treat individual numbers as exact.
Can I look up a specific version instead of latest?
Not yet. Per-version lookup is on the roadmap — pass name@version syntax and we resolve to latest for now. Vote or comment on the Actor's Issues tab to prioritise this.
What happens when a package has been unpublished?
We log a 404, skip the row, and continue with the remaining packages. The final dataset will be shorter than your input list; the run log will list every skipped package.
Why is the repository_url prefixed with git+?
That is npm's canonical format. Strip the git+ prefix if your downstream tool expects a bare HTTPS URL.
Can I use this alongside the PyPI Package Scraper?
Yes — Devil Scrapes PyPI Package Scraper does the same job for Python packages. Run both and join on package purpose to produce an npm-vs-PyPI ecosystem comparison dataset.
💬 Your feedback
Spotted a bug, hit a weird edge case, or need an additional field? Open an issue on the Actor's Issues tab in Apify Console — we ship fixes weekly and we read every report.