NPM Package Scraper
Pricing
Pay per event
NPM Package Scraper
Pull rich metadata for any NPM package — current version, dependencies, weekly downloads, repo URL, license, keywords, README excerpt, deprecation flag. Free NPM registry + downloads API, no key required.
Pricing
Pay per event
Rating
0.0
(0)
Developer
DevilScrapes
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
🎯 What this scrapes
NPM publishes registry.npmjs.org/<package> for every package, plus api.npmjs.org/downloads/<range>/<pkg> for download counts. This Actor takes a list of package names (or scoped names like @apify/sdk), fans them out, and writes one structured row per package with the latest version metadata and last-week download count.
🔥 What we handle for you
- 🛡️ Browser fingerprint rotation —
curl-cffiimpersonates real Chrome / Firefox / Safari TLS handshakes so the target sees a browser, not Python. - 🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP on every block.
- 🔁 Retries with exponential backoff on
408 / 429 / 5xx— up to 5 attempts per page,Retry-Afterhonoured. - 🧱 Rate-limit-aware pacing — when the target pushes back, we slow down instead of getting banned.
- 🧊 Clean, typed dataset rows — Pydantic-validated, ISO-8601 timestamps, stable IDs, JSON / CSV / Excel export straight from the Apify Console.
- 💰 Pay-Per-Event pricing — you only pay for results that hit your dataset. No data, no charge.
💡 Use cases
- Bundle audit — score every direct dependency in your repo for downloads + license + deprecation.
- Vendor consolidation — compare two competing libraries side-by-side on weekly downloads.
- Hiring intel — find org members listed as maintainers on widely-used npm packages.
- Release-radar dashboards — daily diffs on weekly_downloads to spot momentum shifts.
⚙️ How to use it
- Click Try for free at the top of the page.
- Fill in the input form — most fields have sensible defaults.
- Click Start. Output streams into the run's dataset.
- Export from Storage → Dataset as JSON, CSV, or Excel — or fetch via the API.
📥 Input
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
packages | array | yes | ['express', 'react', '@apify/sdk'] | List of NPM package names. Scoped packages are supported (use the literal @scope/name form). |
includeDownloads | boolean | no | True | Adds last-week download count via the api.npmjs.org downloads API. One extra request per package. |
concurrency | integer | no | 8 | Parallel registry requests. |
proxyConfiguration | object | no | {'useApifyProxy': False} | Proxy optional. The NPM registry is happy to be hit directly. |
Example input
{"packages": ["express"],"includeDownloads": true,"concurrency": 4,"proxyConfiguration": {"useApifyProxy": false}}
📤 Output
Every row is one dataset item.
| Field | Type | Notes |
|---|---|---|
name | string | Package name (matches npm exactly, including @scope). |
version | string | Current latest dist-tag version. |
description | ['string', 'null'] | Package description. |
homepage | ['string', 'null'] | Project homepage URL. |
license | ['string', 'null'] | SPDX license identifier. |
author | ['string', 'null'] | Author display string. |
maintainers | array | Maintainer logins. |
keywords | array | Tags from package.json. |
repository_url | ['string', 'null'] | Source repo URL (often git+https://github.com/...). |
bugs_url | ['string', 'null'] | Bug-tracker URL if defined. |
dist_tarball | ['string', 'null'] | Tarball download URL for the latest version. |
engines | ['object', 'null'] | engines field — node/npm version constraints. |
dependencies | ['object', 'null'] | Runtime dependencies map. |
dev_dependencies | ['object', 'null'] | Dev dependencies map. |
peer_dependencies | ['object', 'null'] | Peer dependencies map. |
deprecated | ['string', 'null'] | Deprecation message if the version is deprecated. |
weekly_downloads | ['integer', 'null'] | Downloads in the last 7 days (when includeDownloads=true). |
package_url | string | npmjs.com canonical URL. |
published_at | ['string', 'null'] | Publish timestamp of the latest version. |
scraped_at | string | When this row was recorded. |
Example output
{"name": "express","version": "4.21.2","description": "Fast, unopinionated, minimalist web framework","license": "MIT","weekly_downloads": 31000000,"package_url": "https://www.npmjs.com/package/express"}
💰 Pricing
Pay-Per-Event — you pay only when these events fire:
| Event | USD | What it is |
|---|---|---|
actor-start | $0.005 | One-off warm-up charge per run |
result | $0.0015 | Per dataset item |
Example: 1 000 results at the rates above ≈ $1.50. No subscription, no minimum, no card to start — Apify gives every new account $5 of free credit.
🚧 Limitations
We pull only the latest dist-tag. Pre-release tags, version-range search, and tarball content extraction are not in scope.
❓ FAQ
Are downloads exact?
NPM's downloads API is widely understood to be ~5% noisy due to bot filtering. Trust the trend, not the exact number.
Can I look up versions other than latest?
Not yet — pass name@version and we'll resolve to latest. Per-version lookup is a roadmap item.
What if a package was unpublished?
We log a 404 and skip the row.
Why is the repo URL prefixed with git+?
That's npm's canonical form. Strip the prefix if you need a bare URL.
💬 Your feedback
Spotted a bug, hit a weird edge case, or need a new field? Open an issue on the Actor's Issues tab on Apify Console — we ship fixes weekly and we read every report.