NPM Package Scraper - Search, Maintainers, Downloads, Emails
Pricing
from $1.50 / 1,000 npm package records
NPM Package Scraper - Search, Maintainers, Downloads, Emails
Scrape npmjs.org packages with 30+ fields: maintainer emails, weekly/monthly downloads, dependents, scores, repo & homepage. Search, lookup, or by-author modes.
Pricing
from $1.50 / 1,000 npm package records
Rating
0.0
(0)
Developer
deusex machine
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
NPM Package Scraper โ Search, Maintainer Emails, Downloads, Scores
Scrape npmjs.org with 30+ fields per package. Use this NPM scraper as a fast, no-auth alternative to the npm Registry API to find packages, build an npm package database, extract maintainer emails, monitor downloads, and feed an ML dataset of the JavaScript ecosystem.
If you have ever opened registry.npmjs.org and tried to bulk-export packages, you already know the problem: the npm registry API works one package at a time, has poorly documented search semantics, and returns nothing about downloads or scores. This actor wraps the public npm registry API and api.npmjs.com together, normalizes the responses into a clean schema, and adds optional web enrichment so you finish in minutes what would otherwise take days.
No API key needed. No rate-limit headaches. No Puppeteer. Just an HTTP scraper that returns clean JSON or CSV.
๐ก Looking for npm package data, an npm registry mirror, or an npm package finder? This is the actor. It supports full-text search, lookup by package names, by author / scope, or by keyword (tag) โ and exports straight to Apify Dataset, CSV, JSON, or Excel.
๐ Why this NPM scraper
- 30+ fields per package โ name, version, author, maintainers, license, repository, dependencies counts, dependents counts, dist-tags, downloads, scores, README
- Maintainer emails included โ npm exposes maintainer emails in the public registry response; this actor surfaces them as a clean array
- Four search modes โ full-text, by names, by author/scope, by keyword
- Optional web enrichment โ for every unique maintainer, search Google for personal website, GitHub, LinkedIn and secondary emails (uses Apify residential proxy)
- Downloads stats โ last day, week, month, year (combined registry + downloads API)
- NPM quality score โ final, quality, popularity, maintenance (raw scoring from the public Search API)
- Filters โ minimum weekly downloads, minimum score, license substring, max results up to 5,000
- Outputs โ Apify Dataset โ CSV, JSON, Excel, XML, RSS
Built for B2B prospectors, dev-tools founders, supply-chain security teams, recruiters sourcing JavaScript talent, and ML researchers building open-source ecosystem datasets.
๐ What this NPM Package Scraper extracts
| Field | Description |
|---|---|
name | Package name (e.g. express, @nestjs/core) |
scope | Scope without @ (e.g. nestjs) โ null for unscoped |
version | Latest published version |
description | Short description from package.json |
keywords | Array of npm keywords / tags |
license | SPDX license string (e.g. MIT, Apache-2.0) |
author | Object with name, email, url |
maintainers | Array of { name, email } โ emails included |
maintainerEmails | Deduped flat array of all maintainer emails |
contributors | Array of { name, email, url } |
publisher | Username of the last npm publish |
repository | Git URL of the source repository |
githubOwner | Parsed GitHub owner from the repository URL |
githubRepo | Parsed GitHub repo name |
homepage | Project homepage URL |
bugs | Bug-tracker URL |
funding | Funding URL or array |
engines | Node / npm engine constraints |
dependencies | Count of runtime dependencies |
devDependencies | Count of dev dependencies |
peerDependencies | Count of peer dependencies |
dependentsCount | Number of packages on npm depending on this one |
score | Combined npm score 0..1 |
scoreQuality | Quality sub-score 0..1 |
scorePopularity | Popularity sub-score 0..1 |
scoreMaintenance | Maintenance sub-score 0..1 |
downloadsLastDay | Downloads in the last 24 hours |
downloadsLastWeek | Downloads in the last 7 days |
downloadsLastMonth | Downloads in the last 30 days |
downloadsLastYear | Downloads in the last 365 days |
versionsCount | Total versions ever published |
firstPublishedAt | ISO timestamp of the first publish |
lastModified | ISO timestamp of the most recent change |
lastPublishedAt | ISO timestamp of the most recent version publish |
tarballUrl | Direct URL to the latest tarball |
unpackedSize | Tarball unpacked size in bytes |
fileCount | Number of files in the tarball |
readme | README markdown (optional, truncated to 5,000 chars) |
enrichment.website | Maintainer website (optional) |
enrichment.linkedin | Maintainer LinkedIn (optional) |
enrichment.github | Maintainer GitHub (optional) |
enrichment.emails | Secondary emails found via SERP (optional) |
url | Canonical npmjs.com/package/... URL |
๐ฏ Search modes
Pick the mode that matches your goal. All modes share the same filters and output schema.
1. search โ Full-text npm package search
Query npm by free text across name, description and keywords. This is the same engine that powers npm search and the npmjs.com search bar, but it returns clean JSON for thousands of packages at once.
{"searchType": "search","searchQuery": "openai client","minDownloadsLastWeek": 500,"maxResults": 100,"includeDownloadsStats": true}
Common queries: react, typescript, vite plugin, tailwind, nestjs, eslint config, ai sdk, aws lambda, next-auth.
2. byNames โ Bulk npm package lookup
Give a list of package names and get the full record for each. Perfect for auditing a package.json, enriching a dependency graph, or building a curated dataset.
{"searchType": "byNames","names": ["express", "fastify", "koa", "hono", "@nestjs/core", "@hono/node-server"],"includeDownloadsStats": true,"includeReadme": true}
3. byAuthor โ All npm packages by author or scope
Pull every package published by a specific maintainer or under an @scope. Great for competitive intel ("what does Vercel maintain on npm?") or for outreach to prolific authors.
{"searchType": "byAuthor","author": "sindresorhus","maxResults": 500,"includeDownloadsStats": true,"enrichWithGoogle": true,"enrichLimit": 1}
Also accepts scopes: @vercel, @nestjs, @tanstack, @shadcn, @vueuse, @radix-ui, @shopify, @apify.
4. byKeyword โ npm packages by keyword / tag
npm packages can declare keywords ("keywords": ["cli", "react", "typescript"]). This mode finds every package that tags itself with a given keyword. Ideal for category mapping or building a "best of X" list.
{"searchType": "byKeyword","keyword": "react","minDownloadsLastWeek": 10000,"minScore": 0.5,"maxResults": 200,"license": "MIT"}
Hot keywords on npm right now: react, nextjs, vite, typescript, cli, ai, llm, openai, langchain, mcp, astro, nuxt, solidjs, tailwindcss, playwright.
๐ก Use cases
This npm package scraper is designed for lead generation, market research, supply-chain security, and ML data engineering on top of the npm ecosystem.
- DevRel & developer outreach โ find every package maintainer in a niche (React, Vite, AI, MCP) and email them about your SDK, beta program, or sponsorship
- Dev-tools SaaS sales pipeline โ pull packages with high weekly downloads in your target stack, then enrich each maintainer's email and LinkedIn for outbound. Comparable to Apollo / Hunter but with package context as the signal
- Recruiter tech sourcing โ surface active open-source JavaScript and TypeScript maintainers with verified emails. Way better than scraping LinkedIn (cheaper, safer, more accurate)
- Supply-chain security & SBOM โ bulk lookup an entire
package.jsonorpackage-lock.jsonto auditdependentsCount,lastPublishedAt, license drift, and abandoned packages - VC and ecosystem analysis โ map fast-growing categories (AI SDK clients, MCP servers, React Native libraries) by combining keyword search + downloads stats over time
- ML dataset of npm โ export the full npm registry as a clean JSON / CSV / Parquet dataset for LLM training, code intelligence, or benchmark suites
- Newsletter automation โ feed a daily / weekly "new on npm" digest by filtering on
firstPublishedAtwithin the last 7 days - Competitive intelligence on dev-tools โ see which dependencies your competitor's product imports by scraping their published package
- Brand monitoring on npm โ find every package mentioning your company name in
descriptionorkeywords(typo-squats, integrations, plugins) - Trending npm packages dashboard โ combine downloads delta + score deltas to build a "Hacker News for npm" feed
๐งพ Example output
A single record from a byNames: ["next-auth"] run looks like this (truncated for brevity):
{"name": "next-auth","scope": null,"version": "5.0.0-beta.20","description": "Authentication for the Web.","keywords": ["authentication", "nextjs", "oauth", "jwt"],"license": "ISC","author": { "name": "Iain Collins", "email": "me@iaincollins.com" },"maintainers": [{ "name": "balazsorban", "email": "balazs@authjs.dev" },{ "name": "iaincollins", "email": "me@iaincollins.com" }],"maintainerEmails": ["balazs@authjs.dev", "me@iaincollins.com"],"publisher": "balazsorban","repository": "git+https://github.com/nextauthjs/next-auth.git","githubOwner": "nextauthjs","githubRepo": "next-auth","homepage": "https://authjs.dev","engines": { "node": "^18.17.0 || ^19.8.0 || >= 20.0.0" },"dependencies": 5,"devDependencies": 12,"peerDependencies": 1,"dependentsCount": 2317,"score": 0.71,"scoreQuality": 0.84,"scorePopularity": 0.91,"scoreMaintenance": 0.55,"downloadsLastDay": 124310,"downloadsLastWeek": 1843207,"downloadsLastMonth": 7521094,"downloadsLastYear": 78911234,"versionsCount": 423,"firstPublishedAt": "2018-08-30T14:25:01.123Z","lastPublishedAt": "2026-05-15T09:12:44.901Z","tarballUrl": "https://registry.npmjs.org/next-auth/-/next-auth-5.0.0-beta.20.tgz","unpackedSize": 5824113,"fileCount": 412,"url": "https://www.npmjs.com/package/next-auth"}
๐ Compared to alternatives
| Tool | Maintainer emails | Downloads stats | Bulk lookup | Search by keyword | Web enrichment | Cost |
|---|---|---|---|---|---|---|
| NPM Package Scraper (this actor) | โ Included | โ Day / Week / Month / Year | โ Up to 5,000 | โ Full | โ Optional | Pay-per-event |
npm CLI (npm search, npm view) | โ | โ (no downloads in CLI) | โ ๏ธ 1 at a time | โ ๏ธ Limited | โ | Free, painful |
registry.npmjs.org REST | โ ๏ธ Partial | โ Separate API | โ ๏ธ 1 at a time | โ ๏ธ Limited | โ | Free |
| npms.io API | โ | โ ๏ธ Aggregated | โ | โ | โ | Free, often down |
| libraries.io API | โ | โ ๏ธ Slow | โ | โ | โ | Free tier limited |
| Snyk Advisor | โ | โ | โ | โ | โ | Subscription |
If you only need one package, npm view is fine. For anything at scale โ outreach, ML datasets, supply-chain audits โ running this actor saves hours and gives you a unified schema.
โ๏ธ Input parameters reference
| Parameter | Type | Default | Description |
|---|---|---|---|
searchType | string enum | search | search / byNames / byAuthor / byKeyword |
searchQuery | string | react | Used with search. Free-text npm search |
names | string[] | โ | Used with byNames. List of package names |
author | string | โ | Used with byAuthor. Maintainer username or @scope |
keyword | string | โ | Used with byKeyword. npm keyword / tag |
minDownloadsLastWeek | integer | โ | Drop packages below this weekly downloads count |
minScore | number | โ | Drop packages below this combined score 0..1 |
license | string | โ | SPDX substring filter (e.g. MIT, Apache) |
maxResults | integer | 100 | Hard cap (1โ5,000) |
includeDownloadsStats | boolean | true | Fetch day / week / month / year downloads |
includeReadme | boolean | false | Include README markdown (truncated 5K) |
enrichWithGoogle | boolean | false | Find maintainer website + LinkedIn + secondary emails |
enrichLimit | integer | 50 | Max unique maintainers to enrich (1โ1,000) |
proxyConfig | proxy | residential | Proxy used for enrichment requests only |
๐ฐ Pricing & cost
Pay-per-event:
- Per package returned โ small fee, scales linearly
- Per enriched maintainer โ only when
enrichWithGoogle: true
A typical run of 1,000 packages without enrichment costs less than a single coffee. Bulk lookups of an entire package.json (50โ200 packages) are essentially free.
The actor only billing-events when a real record is delivered to the Dataset. Failed retries, redirects and rate-limit backoffs are not charged.
โ Frequently asked questions
Is this an official npm API client?
No. The actor calls the same public npm endpoints that the npm CLI calls (registry.npmjs.org and api.npmjs.com). No login, no .npmrc, no tokens.
Do you respect npm's terms of service? Yes. The npm registry is explicitly public and designed for read-heavy traffic. We add polite delays and exponential backoff on 429s.
Are the maintainer emails real?
Yes. npm requires maintainers to publish a verified email when they create the account. Those emails are part of the public package metadata returned by registry.npmjs.org/<package>.
Can I scrape the full npm registry (3M+ packages)? Technically yes, but you almost never want to. Most users filter by keyword + minimum downloads + score to get a workable shortlist. If you really need the full mirror, contact us and we will guide you.
How fresh is the data? Live. Every request hits npm in real time. No stale cache.
Can I get GitHub stars / issues for each package?
The actor returns the parsed githubOwner / githubRepo. Pair it with a GitHub scraper actor to merge in stars, issues and contributor counts.
What is the difference between this and npm search?
npm search returns 20 results, no downloads, no scores, no maintainer emails, and is throttled. This actor returns up to 5,000 results with the full record per package.
How do I find npm packages by GitHub owner?
Use searchType: "byAuthor" with the GitHub org name. Most major OSS orgs (vercel, nestjs, tanstack) publish under matching npm scopes.
Can I use this as an npm registry mirror? Not exactly โ we do not store the tarballs. But for metadata, downloads, and scores, the output dataset is functionally equivalent to a queryable npm registry mirror.
How does the enrichment work? For every unique maintainer email, the actor runs a small SERP query (Google + Bing) to find the personal website, GitHub, LinkedIn, and any secondary emails published on those pages. This is the same approach Apollo and Hunter use, applied to OSS maintainers.
Does enrichment increase the cost a lot?
Only the maintainers you enrich are billed (max controlled by enrichLimit). Enriching 50 unique maintainers in a 1,000-package run is the standard sweet spot.
Can I run this on a schedule?
Yes. Apify Schedules supports cron expressions. A daily run that filters firstPublishedAt >= now - 24h gives you a "new on npm" feed.
Does it export to CSV / Excel? Apify Dataset can export to CSV, JSON, Excel, XML, RSS, Markdown table, and HTML. Use the API or download from the Apify console.
What about private npm registries (GitHub Packages, Verdaccio, JFrog, Azure Artifacts)?
This actor targets the public registry.npmjs.org. Private registries are not supported.
How does it compare to npms.io and libraries.io? This actor is faster, has more fields per package (especially maintainer emails + downloads stats), and is actively maintained. npms.io is frequently down; libraries.io has a slow API.
Can I integrate this with Claude, Cursor, or other AI agents? Yes โ use Apify's MCP server wrapper, or call the actor via the Apify API from your agent. We also publish dedicated MCP server actors (see below).
๐ Other actors by makework36
Useful companions for npm + JavaScript + lead-gen workflows:
- Lovable Sites Scraper โ discover
.lovable.appAI-built apps + custom domains - StackOverflow Scraper โ questions, answers and tags
- Website Email & Contact Finder โ extract emails + social links from any URL
- Reddit MCP Server โ Reddit access for Claude, Cursor, ChatGPT
- Reddit SaaS Leads Scraper โ find startup pain points and early adopters
- GitHub-style scraping โ full portfolio of HTTP-only scrapers
- Substack Scraper โ newsletter posts and authors
- Goodreads Scraper โ books, authors, ratings, ISBN
- Shopify Products Scraper โ any Shopify store catalog + variants
- Skyscanner MCP Server โ flight prices for AI agents
๐ Changelog
- v0.1 โ Initial release. Four search modes, downloads stats, npm score, optional maintainer enrichment via SERP.
๐ ๏ธ Support
Found a missing field, a bug, or a use case the actor doesn't cover? Open an issue or message me directly from the Apify Console. I respond fast and ship fixes within hours for paying users.