NPM Package Scraper — npm metadata api avatar

NPM Package Scraper — npm metadata api

Pricing

Pay per event

Go to Apify Store
NPM Package Scraper — npm metadata api

NPM Package Scraper — npm metadata api

Pull rich metadata for any NPM package via the npm registry API — current version, dependencies, weekly downloads, repo URL, license, keywords, README excerpt, deprecation flag — export to JSON or CSV. Free npm registry + downloads API, no key required.

Pricing

Pay per event

Rating

0.0

(0)

Developer

DevilScrapes

DevilScrapes

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

20 hours ago

Last modified

Categories

Share


🎯 What this scrapes

The NPM registry exposes a JSON endpoint at registry.npmjs.org/<package> for every published package, plus a separate download-count API at api.npmjs.org/downloads/point/last-week/<pkg>. This Actor accepts a list of package names (including scoped packages like @apify/sdk), fans them out in parallel, merges both API responses, and writes one clean structured row per package. Think of it as a production-grade npm metadata api wrapper — no auth tokens, no manual pagination, no stitching two endpoints together yourself.

🔥 Features

What the Actor does:

  • Parallel bulk lookup — configurable concurrency (default 8) across the registry + downloads APIs; process hundreds of packages in a single run.
  • Scoped package support@scope/name form works natively, no URL-encoding gymnastics required.
  • Weekly download counts — optional merge with the NPM downloads API (one extra request per package, toggleable).
  • Deprecation detection — surfaces the full deprecation message when a version is marked deprecated, so your audit pipeline catches it automatically.
  • Full dependency mapsdependencies, devDependencies, and peerDependencies as structured objects, not raw strings.
  • Structured, validated output — Pydantic-validated rows with ISO-8601 timestamps and stable field names; export as JSON, CSV, or Excel from Apify Console in one click.

What we handle for you:

  • 🛡️ Browser fingerprint rotationcurl-cffi replays real Chrome / Firefox / Safari TLS handshakes so every request looks like a genuine browser, not a Python script.
  • 🌐 Residential proxy rotation via Apify Proxy — fresh session ID and exit IP on every block so you never burn a single IP.
  • 🔁 Retries with exponential backoff on 408 / 429 / 5xx — up to 5 attempts per package, Retry-After respected.
  • 🧱 Rate-limit-aware pacing — we slow down when the target pushes back instead of getting the run banned.
  • 🧊 Clean, typed dataset rows — Pydantic-validated, ISO-8601 timestamps, stable field names, JSON / CSV / Excel export straight from Apify Console.
  • 💰 Pay-Per-Event pricing — you pay only for results that land in your dataset. No data, no charge.

💡 Use cases

  • Dependency audit — score every package in your package.json for weekly downloads, license compliance, and deprecation status before merging.
  • Vendor benchmarking — compare competing libraries side-by-side on download trends, maintenance activity, and known issues.
  • Supply-chain monitoring — feed the output into Socket, Snyk, or a custom risk dashboard to catch newly-deprecated dependencies before they ship.
  • SDK download leaderboards — track weekly download momentum for your own packages and competitors over time.
  • AI / RAG knowledge graphs — seed an LLM index with structured npm metadata api responses for package-aware code assistants.
  • Hiring intel — find org members listed as maintainers on widely-used packages to inform developer outreach.

⚙️ How to use it

  1. Click Try for free at the top of the Store page.
  2. Paste your package list into the packages field — one name per line or as a JSON array, scoped packages included.
  3. Toggle includeDownloads on if you want last-week download counts (adds one extra request per package).
  4. Click Start. Output streams into the run's dataset in real time.
  5. Export from Storage → Dataset as JSON, CSV, or Excel — or pull via the Apify API with your token.

📥 Input

FieldTypeRequiredDefaultNotes
packagesarrayyes["express", "react", "@apify/sdk"]Package names to look up. Scoped packages (@scope/name) are supported.
includeDownloadsbooleannotrueMerges last-week download count from the NPM downloads API. One extra request per package.
concurrencyintegerno8Parallel requests to the registry. Raise carefully — aggressive concurrency can trigger rate-limiting.
proxyConfigurationobjectno{"useApifyProxy": true}Proxy settings. We recommend leaving this on Apify Proxy — it keeps session state consistent across retries.

Example input

{
"packages": [
"express",
"react",
"@apify/sdk"
],
"includeDownloads": true,
"concurrency": 8,
"proxyConfiguration": {
"useApifyProxy": true
}
}

📤 Output

One dataset row per package.

FieldTypeNotes
namestringPackage name, including @scope if present.
versionstringCurrent latest dist-tag version.
descriptionstring | nullPackage description string.
homepagestring | nullProject homepage URL.
licensestring | nullSPDX license identifier (e.g. MIT, Apache-2.0).
authorstring | nullAuthor display string.
maintainersarrayList of maintainer login names.
keywordsarrayTags from package.json.
repository_urlstring | nullSource repo URL (typically git+https://github.com/...).
bugs_urlstring | nullBug-tracker URL if defined.
dist_tarballstring | nullTarball download URL for the latest version.
enginesobject | nullNode/npm version constraints from the engines field.
dependenciesobject | nullRuntime dependency map (name → version range).
dev_dependenciesobject | nullDev dependency map.
peer_dependenciesobject | nullPeer dependency map.
deprecatedstring | nullFull deprecation message when the version is deprecated; null otherwise.
weekly_downloadsinteger | nullDownloads in the last 7 days (populated when includeDownloads is true).
package_urlstringCanonical npmjs.com package page URL.
published_atstring | nullISO-8601 publish timestamp for the latest version.
scraped_atstringISO-8601 timestamp when this row was recorded.

Example output

{
"name": "express",
"version": "4.21.2",
"description": "Fast, unopinionated, minimalist web framework",
"license": "MIT",
"maintainers": ["wesleytodd", "dougwilson"],
"keywords": ["express", "framework", "sinatra", "web", "rest", "restful", "router", "app"],
"repository_url": "git+https://github.com/expressjs/express.git",
"dependencies": {
"accepts": "~1.3.8",
"array-flatten": "1.1.1"
},
"deprecated": null,
"weekly_downloads": 31000000,
"package_url": "https://www.npmjs.com/package/express",
"published_at": "2024-03-25T14:09:03.000Z",
"scraped_at": "2026-06-01T10:00:00.000Z"
}

💰 Pricing

Pay-Per-Event — you pay only when these events fire:

EventCost (USD)What it covers
actor-start$0.005One-off warm-up charge per run
result$0.0015Per dataset row written

Example: 1 000 packages at the rates above ≈ $1.50. No subscription, no minimum, no card to start — every new Apify account gets $5 of free credit.

Compared to calling the npm metadata api yourself: no rate-limit management, no session handling, no merging two separate API endpoints, no retry logic. We charge $1.50 per thousand; you save the engineering hours.

🚧 Limitations

  • latest dist-tag only — per-version lookup (e.g. express@4.18.0) is not yet supported.
  • Pre-release tags excludedalpha, beta, next dist-tags are not resolved.
  • Tarball content not extracted — we return the tarball URL; we do not download or unpack it.
  • Downloads API cap — the NPM downloads API hard-caps at 128 packages per bulk call; we fan out automatically, but very large batches will take proportionally longer.
  • Private packages — scoped private packages (@org/internal-pkg) return a 404 from the public registry and are skipped with a log warning.

❓ FAQ

What exactly is the npm metadata api?

The NPM registry exposes https://registry.npmjs.org/<package> as a JSON document with full package metadata — versions, maintainers, dependencies, dist tarballs, and more. A separate endpoint at https://api.npmjs.org/downloads/point/last-week/<pkg> returns download counts. This Actor calls both, merges the responses, and delivers clean structured rows. You get a production-grade npm metadata api pipeline without maintaining the plumbing yourself.

Is this an npm registry scraper or an API wrapper?

Both. The registry exposes clean JSON, but stitching two API hosts, handling scoped package names, managing retries, and staying inside rate limits is real engineering work. We do all of that and return validated rows. Think of it as an npm registry scraper that handles the messy bits for you.

Are download counts exact?

NPM's downloads API is widely understood to include a ~5% noise band due to bot-traffic filtering on their end. Trust the trend and relative magnitude; don't treat individual numbers as exact.

Can I look up a specific version instead of latest?

Not yet. Per-version lookup is on the roadmap — pass name@version syntax and we resolve to latest for now. Vote or comment on the Actor's Issues tab to prioritise this.

What happens when a package has been unpublished?

We log a 404, skip the row, and continue with the remaining packages. The final dataset will be shorter than your input list; the run log will list every skipped package.

Why is the repository_url prefixed with git+?

That is npm's canonical format. Strip the git+ prefix if your downstream tool expects a bare HTTPS URL.

Can I use this alongside the PyPI Package Scraper?

Yes — Devil Scrapes PyPI Package Scraper does the same job for Python packages. Run both and join on package purpose to produce an npm-vs-PyPI ecosystem comparison dataset.

💬 Your feedback

Spotted a bug, hit a weird edge case, or need an additional field? Open an issue on the Actor's Issues tab in Apify Console — we ship fixes weekly and we read every report.