NPM Package Scraper - Search, Maintainers, Downloads, Emails avatar

NPM Package Scraper - Search, Maintainers, Downloads, Emails

Pricing

from $1.50 / 1,000 npm package records

Go to Apify Store
NPM Package Scraper - Search, Maintainers, Downloads, Emails

NPM Package Scraper - Search, Maintainers, Downloads, Emails

Scrape npmjs.org packages with 30+ fields: maintainer emails, weekly/monthly downloads, dependents, scores, repo & homepage. Search, lookup, or by-author modes.

Pricing

from $1.50 / 1,000 npm package records

Rating

0.0

(0)

Developer

deusex machine

deusex machine

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Share

NPM Package Scraper โ€” Search, Maintainer Emails, Downloads, Scores

Scrape npmjs.org with 30+ fields per package. Use this NPM scraper as a fast, no-auth alternative to the npm Registry API to find packages, build an npm package database, extract maintainer emails, monitor downloads, and feed an ML dataset of the JavaScript ecosystem.

If you have ever opened registry.npmjs.org and tried to bulk-export packages, you already know the problem: the npm registry API works one package at a time, has poorly documented search semantics, and returns nothing about downloads or scores. This actor wraps the public npm registry API and api.npmjs.com together, normalizes the responses into a clean schema, and adds optional web enrichment so you finish in minutes what would otherwise take days.

No API key needed. No rate-limit headaches. No Puppeteer. Just an HTTP scraper that returns clean JSON or CSV.

๐Ÿ’ก Looking for npm package data, an npm registry mirror, or an npm package finder? This is the actor. It supports full-text search, lookup by package names, by author / scope, or by keyword (tag) โ€” and exports straight to Apify Dataset, CSV, JSON, or Excel.


๐Ÿš€ Why this NPM scraper

  • 30+ fields per package โ€” name, version, author, maintainers, license, repository, dependencies counts, dependents counts, dist-tags, downloads, scores, README
  • Maintainer emails included โ€” npm exposes maintainer emails in the public registry response; this actor surfaces them as a clean array
  • Four search modes โ€” full-text, by names, by author/scope, by keyword
  • Optional web enrichment โ€” for every unique maintainer, search Google for personal website, GitHub, LinkedIn and secondary emails (uses Apify residential proxy)
  • Downloads stats โ€” last day, week, month, year (combined registry + downloads API)
  • NPM quality score โ€” final, quality, popularity, maintenance (raw scoring from the public Search API)
  • Filters โ€” minimum weekly downloads, minimum score, license substring, max results up to 5,000
  • Outputs โ€” Apify Dataset โ†’ CSV, JSON, Excel, XML, RSS

Built for B2B prospectors, dev-tools founders, supply-chain security teams, recruiters sourcing JavaScript talent, and ML researchers building open-source ecosystem datasets.


๐Ÿ“Š What this NPM Package Scraper extracts

FieldDescription
namePackage name (e.g. express, @nestjs/core)
scopeScope without @ (e.g. nestjs) โ€” null for unscoped
versionLatest published version
descriptionShort description from package.json
keywordsArray of npm keywords / tags
licenseSPDX license string (e.g. MIT, Apache-2.0)
authorObject with name, email, url
maintainersArray of { name, email } โ€” emails included
maintainerEmailsDeduped flat array of all maintainer emails
contributorsArray of { name, email, url }
publisherUsername of the last npm publish
repositoryGit URL of the source repository
githubOwnerParsed GitHub owner from the repository URL
githubRepoParsed GitHub repo name
homepageProject homepage URL
bugsBug-tracker URL
fundingFunding URL or array
enginesNode / npm engine constraints
dependenciesCount of runtime dependencies
devDependenciesCount of dev dependencies
peerDependenciesCount of peer dependencies
dependentsCountNumber of packages on npm depending on this one
scoreCombined npm score 0..1
scoreQualityQuality sub-score 0..1
scorePopularityPopularity sub-score 0..1
scoreMaintenanceMaintenance sub-score 0..1
downloadsLastDayDownloads in the last 24 hours
downloadsLastWeekDownloads in the last 7 days
downloadsLastMonthDownloads in the last 30 days
downloadsLastYearDownloads in the last 365 days
versionsCountTotal versions ever published
firstPublishedAtISO timestamp of the first publish
lastModifiedISO timestamp of the most recent change
lastPublishedAtISO timestamp of the most recent version publish
tarballUrlDirect URL to the latest tarball
unpackedSizeTarball unpacked size in bytes
fileCountNumber of files in the tarball
readmeREADME markdown (optional, truncated to 5,000 chars)
enrichment.websiteMaintainer website (optional)
enrichment.linkedinMaintainer LinkedIn (optional)
enrichment.githubMaintainer GitHub (optional)
enrichment.emailsSecondary emails found via SERP (optional)
urlCanonical npmjs.com/package/... URL

๐ŸŽฏ Search modes

Pick the mode that matches your goal. All modes share the same filters and output schema.

Query npm by free text across name, description and keywords. This is the same engine that powers npm search and the npmjs.com search bar, but it returns clean JSON for thousands of packages at once.

{
"searchType": "search",
"searchQuery": "openai client",
"minDownloadsLastWeek": 500,
"maxResults": 100,
"includeDownloadsStats": true
}

Common queries: react, typescript, vite plugin, tailwind, nestjs, eslint config, ai sdk, aws lambda, next-auth.

2. byNames โ€” Bulk npm package lookup

Give a list of package names and get the full record for each. Perfect for auditing a package.json, enriching a dependency graph, or building a curated dataset.

{
"searchType": "byNames",
"names": ["express", "fastify", "koa", "hono", "@nestjs/core", "@hono/node-server"],
"includeDownloadsStats": true,
"includeReadme": true
}

3. byAuthor โ€” All npm packages by author or scope

Pull every package published by a specific maintainer or under an @scope. Great for competitive intel ("what does Vercel maintain on npm?") or for outreach to prolific authors.

{
"searchType": "byAuthor",
"author": "sindresorhus",
"maxResults": 500,
"includeDownloadsStats": true,
"enrichWithGoogle": true,
"enrichLimit": 1
}

Also accepts scopes: @vercel, @nestjs, @tanstack, @shadcn, @vueuse, @radix-ui, @shopify, @apify.

4. byKeyword โ€” npm packages by keyword / tag

npm packages can declare keywords ("keywords": ["cli", "react", "typescript"]). This mode finds every package that tags itself with a given keyword. Ideal for category mapping or building a "best of X" list.

{
"searchType": "byKeyword",
"keyword": "react",
"minDownloadsLastWeek": 10000,
"minScore": 0.5,
"maxResults": 200,
"license": "MIT"
}

Hot keywords on npm right now: react, nextjs, vite, typescript, cli, ai, llm, openai, langchain, mcp, astro, nuxt, solidjs, tailwindcss, playwright.


๐Ÿ’ก Use cases

This npm package scraper is designed for lead generation, market research, supply-chain security, and ML data engineering on top of the npm ecosystem.

  • DevRel & developer outreach โ€” find every package maintainer in a niche (React, Vite, AI, MCP) and email them about your SDK, beta program, or sponsorship
  • Dev-tools SaaS sales pipeline โ€” pull packages with high weekly downloads in your target stack, then enrich each maintainer's email and LinkedIn for outbound. Comparable to Apollo / Hunter but with package context as the signal
  • Recruiter tech sourcing โ€” surface active open-source JavaScript and TypeScript maintainers with verified emails. Way better than scraping LinkedIn (cheaper, safer, more accurate)
  • Supply-chain security & SBOM โ€” bulk lookup an entire package.json or package-lock.json to audit dependentsCount, lastPublishedAt, license drift, and abandoned packages
  • VC and ecosystem analysis โ€” map fast-growing categories (AI SDK clients, MCP servers, React Native libraries) by combining keyword search + downloads stats over time
  • ML dataset of npm โ€” export the full npm registry as a clean JSON / CSV / Parquet dataset for LLM training, code intelligence, or benchmark suites
  • Newsletter automation โ€” feed a daily / weekly "new on npm" digest by filtering on firstPublishedAt within the last 7 days
  • Competitive intelligence on dev-tools โ€” see which dependencies your competitor's product imports by scraping their published package
  • Brand monitoring on npm โ€” find every package mentioning your company name in description or keywords (typo-squats, integrations, plugins)
  • Trending npm packages dashboard โ€” combine downloads delta + score deltas to build a "Hacker News for npm" feed

๐Ÿงพ Example output

A single record from a byNames: ["next-auth"] run looks like this (truncated for brevity):

{
"name": "next-auth",
"scope": null,
"version": "5.0.0-beta.20",
"description": "Authentication for the Web.",
"keywords": ["authentication", "nextjs", "oauth", "jwt"],
"license": "ISC",
"author": { "name": "Iain Collins", "email": "me@iaincollins.com" },
"maintainers": [
{ "name": "balazsorban", "email": "balazs@authjs.dev" },
{ "name": "iaincollins", "email": "me@iaincollins.com" }
],
"maintainerEmails": ["balazs@authjs.dev", "me@iaincollins.com"],
"publisher": "balazsorban",
"repository": "git+https://github.com/nextauthjs/next-auth.git",
"githubOwner": "nextauthjs",
"githubRepo": "next-auth",
"homepage": "https://authjs.dev",
"engines": { "node": "^18.17.0 || ^19.8.0 || >= 20.0.0" },
"dependencies": 5,
"devDependencies": 12,
"peerDependencies": 1,
"dependentsCount": 2317,
"score": 0.71,
"scoreQuality": 0.84,
"scorePopularity": 0.91,
"scoreMaintenance": 0.55,
"downloadsLastDay": 124310,
"downloadsLastWeek": 1843207,
"downloadsLastMonth": 7521094,
"downloadsLastYear": 78911234,
"versionsCount": 423,
"firstPublishedAt": "2018-08-30T14:25:01.123Z",
"lastPublishedAt": "2026-05-15T09:12:44.901Z",
"tarballUrl": "https://registry.npmjs.org/next-auth/-/next-auth-5.0.0-beta.20.tgz",
"unpackedSize": 5824113,
"fileCount": 412,
"url": "https://www.npmjs.com/package/next-auth"
}

๐Ÿ†š Compared to alternatives

ToolMaintainer emailsDownloads statsBulk lookupSearch by keywordWeb enrichmentCost
NPM Package Scraper (this actor)โœ… Includedโœ… Day / Week / Month / Yearโœ… Up to 5,000โœ… Fullโœ… OptionalPay-per-event
npm CLI (npm search, npm view)โŒโŒ (no downloads in CLI)โš ๏ธ 1 at a timeโš ๏ธ LimitedโŒFree, painful
registry.npmjs.org RESTโš ๏ธ PartialโŒ Separate APIโš ๏ธ 1 at a timeโš ๏ธ LimitedโŒFree
npms.io APIโŒโš ๏ธ Aggregatedโœ…โœ…โŒFree, often down
libraries.io APIโŒโš ๏ธ Slowโœ…โœ…โŒFree tier limited
Snyk AdvisorโŒโœ…โŒโŒโŒSubscription

If you only need one package, npm view is fine. For anything at scale โ€” outreach, ML datasets, supply-chain audits โ€” running this actor saves hours and gives you a unified schema.


โš™๏ธ Input parameters reference

ParameterTypeDefaultDescription
searchTypestring enumsearchsearch / byNames / byAuthor / byKeyword
searchQuerystringreactUsed with search. Free-text npm search
namesstring[]โ€”Used with byNames. List of package names
authorstringโ€”Used with byAuthor. Maintainer username or @scope
keywordstringโ€”Used with byKeyword. npm keyword / tag
minDownloadsLastWeekintegerโ€”Drop packages below this weekly downloads count
minScorenumberโ€”Drop packages below this combined score 0..1
licensestringโ€”SPDX substring filter (e.g. MIT, Apache)
maxResultsinteger100Hard cap (1โ€“5,000)
includeDownloadsStatsbooleantrueFetch day / week / month / year downloads
includeReadmebooleanfalseInclude README markdown (truncated 5K)
enrichWithGooglebooleanfalseFind maintainer website + LinkedIn + secondary emails
enrichLimitinteger50Max unique maintainers to enrich (1โ€“1,000)
proxyConfigproxyresidentialProxy used for enrichment requests only

๐Ÿ’ฐ Pricing & cost

Pay-per-event:

  • Per package returned โ€” small fee, scales linearly
  • Per enriched maintainer โ€” only when enrichWithGoogle: true

A typical run of 1,000 packages without enrichment costs less than a single coffee. Bulk lookups of an entire package.json (50โ€“200 packages) are essentially free.

The actor only billing-events when a real record is delivered to the Dataset. Failed retries, redirects and rate-limit backoffs are not charged.


โ“ Frequently asked questions

Is this an official npm API client? No. The actor calls the same public npm endpoints that the npm CLI calls (registry.npmjs.org and api.npmjs.com). No login, no .npmrc, no tokens.

Do you respect npm's terms of service? Yes. The npm registry is explicitly public and designed for read-heavy traffic. We add polite delays and exponential backoff on 429s.

Are the maintainer emails real? Yes. npm requires maintainers to publish a verified email when they create the account. Those emails are part of the public package metadata returned by registry.npmjs.org/<package>.

Can I scrape the full npm registry (3M+ packages)? Technically yes, but you almost never want to. Most users filter by keyword + minimum downloads + score to get a workable shortlist. If you really need the full mirror, contact us and we will guide you.

How fresh is the data? Live. Every request hits npm in real time. No stale cache.

Can I get GitHub stars / issues for each package? The actor returns the parsed githubOwner / githubRepo. Pair it with a GitHub scraper actor to merge in stars, issues and contributor counts.

What is the difference between this and npm search? npm search returns 20 results, no downloads, no scores, no maintainer emails, and is throttled. This actor returns up to 5,000 results with the full record per package.

How do I find npm packages by GitHub owner? Use searchType: "byAuthor" with the GitHub org name. Most major OSS orgs (vercel, nestjs, tanstack) publish under matching npm scopes.

Can I use this as an npm registry mirror? Not exactly โ€” we do not store the tarballs. But for metadata, downloads, and scores, the output dataset is functionally equivalent to a queryable npm registry mirror.

How does the enrichment work? For every unique maintainer email, the actor runs a small SERP query (Google + Bing) to find the personal website, GitHub, LinkedIn, and any secondary emails published on those pages. This is the same approach Apollo and Hunter use, applied to OSS maintainers.

Does enrichment increase the cost a lot? Only the maintainers you enrich are billed (max controlled by enrichLimit). Enriching 50 unique maintainers in a 1,000-package run is the standard sweet spot.

Can I run this on a schedule? Yes. Apify Schedules supports cron expressions. A daily run that filters firstPublishedAt >= now - 24h gives you a "new on npm" feed.

Does it export to CSV / Excel? Apify Dataset can export to CSV, JSON, Excel, XML, RSS, Markdown table, and HTML. Use the API or download from the Apify console.

What about private npm registries (GitHub Packages, Verdaccio, JFrog, Azure Artifacts)? This actor targets the public registry.npmjs.org. Private registries are not supported.

How does it compare to npms.io and libraries.io? This actor is faster, has more fields per package (especially maintainer emails + downloads stats), and is actively maintained. npms.io is frequently down; libraries.io has a slow API.

Can I integrate this with Claude, Cursor, or other AI agents? Yes โ€” use Apify's MCP server wrapper, or call the actor via the Apify API from your agent. We also publish dedicated MCP server actors (see below).


๐Ÿ”— Other actors by makework36

Useful companions for npm + JavaScript + lead-gen workflows:


๐Ÿ“ Changelog

  • v0.1 โ€” Initial release. Four search modes, downloads stats, npm score, optional maintainer enrichment via SERP.

๐Ÿ› ๏ธ Support

Found a missing field, a bug, or a use case the actor doesn't cover? Open an issue or message me directly from the Apify Console. I respond fast and ship fixes within hours for paying users.