🐍 PyPI Scraper β€” Python Package Data avatar

🐍 PyPI Scraper β€” Python Package Data

Pricing

from $3.00 / 1,000 results

Go to Apify Store
🐍 PyPI Scraper β€” Python Package Data

🐍 PyPI Scraper β€” Python Package Data

Extract Python package data from PyPI β€” download stats, dependencies, version history & maintainers. Build Python ecosystem analytics, dependency audits & monitoring dashboards. Pay per package.

Pricing

from $3.00 / 1,000 results

Rating

0.0

(0)

Developer

Stephan Corbeil

Stephan Corbeil

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

0

Monthly active users

21 hours ago

Last modified

Share

🐍 PyPI Package Scraper β€” Python Package Metadata, Versions & Downloads

Bulk-extract Python package metadata from the PyPI registry: name, summary, latest version, version history, dependencies (with extras), classifiers, project URLs, license, maintainers, and download stats from the BigQuery public dataset. A pay-per-result alternative to libraries.io API, Snyk Advisor, PyPiStats, and OSS Insight β€” designed for Python tooling founders, ML-platform teams sizing model adoption, security teams auditing supply chain, and DevRel measuring library traction.

Why PyPI Package Scraper Beats libraries.io, Snyk Advisor, PyPiStats & OSS Insight

FeatureNexGenData PyPI Scraperlibraries.io APISnyk AdvisorPyPiStatsOSS Insight
Cost$2 per 1K packages, pay-per-eventFree (heavy throttle)$25-99 / dev / monthFree (HTML only)$$ / month
Version history with datesYesYesLimitedNoLimited
Classifiers + extrasYesPartialNoNoNo
Download counts (monthly)Yes β€” via BQ public datasetYesNoYesYes
Project URLs (homepage / repo / docs)YesYesYesNoYes
Bulk exportJSON / CSV / ExcelPlan-gatedCSVHTML scrape onlyPlan-gated
AuthApify tokenAPI keySnyk accountNoneOSS Insight account
Monthly minimumNoneNone$25+NonePlan-based

Most Python-ecosystem teams pick this actor instead of hand-rolling a PyPI JSON-API harvester because it is a drop-in alternative to libraries.io with no rate-limit pain, cheaper than Snyk Advisor for non-vuln workflows, and packages download stats together with metadata in a single dataset row β€” saving you a separate PyPiStats lookup per package.

What You Get Per Package

Each dataset item is a flat record:

  • name, summary, description
  • latest_version, latest_release_date
  • versions[] β€” every release with {version, published_at, yanked}
  • requires_dist[] β€” declared dependencies including extras
  • requires_python β€” version constraint
  • classifiers[] β€” Trove classifiers (e.g. "Programming Language :: Python :: 3.12")
  • project_urls β€” {Homepage, Documentation, Source, Bug Tracker, ...}
  • home_page, license
  • author, author_email, maintainer, maintainer_email
  • keywords
  • downloads β€” {last_day, last_week, last_month}
  • wheel_count, sdist_available
  • vulnerabilities[] β€” known PyPI advisories if any

Use Cases

  • ML-platform founders β€” measure the adoption of mlflow vs wandb vs clearml by sorting on downloads.last_month
  • Internal-tools teams β€” audit a private requirements.txt against PyPI to flag yanked, abandoned, or vulnerable packages
  • Data-engineering managers β€” generate a license-compliance report (license field) across every package the team uses
  • VC analysts β€” find breakout Python data tools by ranking on month-over-month download growth
  • Open-source maintainers β€” track downstream requires_dist references to your library to size the ecosystem
  • Security teams β€” pull vulnerabilities[] for every package in a Software Bill of Materials (SBOM)

Quick Start

from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("nexgendata/pypi-scraper").call(run_input={
"packages": ["requests", "pandas", "fastapi", "pydantic", "polars"]
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item["name"], item["latest_version"], item["downloads"]["last_month"])

Pricing

Pay-per-event:

  • Actor Start: small fixed charge per run (memory-scaled)
  • Per package: $2 per 1,000 packages returned

No subscription, no minimum.

Use caseActor
npm package metadata scrapernpm-scraper
PyPI package download statisticspypi-package-stats
npm package download statisticsnpm-package-stats
GitHub trending repositoriesgithub-trending-repos
GitHub repository deep-statsgithub-repo-stats
Stack Overflow Q&A scraperstackoverflow-questions
Developer tools intelligence MCPdeveloper-tools-mcp-server
Hacker News scraperhacker-news-scraper

FAQ

How are download counts calculated? PyPI publishes anonymized download logs to a BigQuery public dataset. We aggregate the last 1 / 7 / 30 days per package, which matches pypistats exactly.

Are pre-release versions included? Yes β€” the full versions[] array includes alphas, betas, and release candidates. Filter on the suffix in your code if you only want stable.

Does the actor follow PyPI Warehouse JSON API? Yes, it uses the official pypi.org/pypi/<pkg>/json endpoint plus the simple-index for completeness.

Output formats? JSON, CSV, Excel, and the Apify dataset API.

Is this legal? Yes β€” PyPI metadata is public.

About NexGenData

NexGenData publishes 260+ buyer-intent actors covering SEC filings, YC alumni, lead generation, competitive intelligence, stock fundamentals across 30+ exchanges, and more. All pay-per-result. Browse the full catalog at https://apify.com/nexgendata?fpr=2ayu9b


How NexGenData Pricing Works

Every NexGenData actor uses pay-per-event pricing β€” you only pay for results that actually land in your dataset. No monthly minimum, no seat fees, no surprise overage bills.

  • Actor Start: a single-event charge each time you spin the actor up (scaled to memory size)
  • Result / item: charged per item written to the default dataset
  • No charge for retries, internal proxy rotation, or failed sub-requests β€” those are absorbed by the platform

Apify Platform Bonus

New to Apify? Sign up with the NexGenData referral link β€” you get free platform credits on signup (enough for several thousand free results) and you help fund the maintenance of this actor fleet.

Integration Surface

Every actor in the NexGenData catalog can be triggered from:

  • Apify console β€” point-and-click run
  • Apify API β€” REST + webhooks
  • Apify Python / JS SDKs β€” programmatic batch
  • Zapier, Make.com, n8n β€” official integrations
  • MCP β€” many actors are exposed as MCP tools for Claude / ChatGPT / Cursor agents
  • Schedules β€” built-in cron for daily / weekly / monthly runs
  • Webhooks β€” POST results to any HTTPS endpoint on dataset write

Support

NexGenData maintains 260+ Apify actors and ships updates regularly. Bug reports via the Apify console issues tab get a response within 24 hours. Roadmap requests are welcome β€” high-demand features ship in the next version.

Home: thenextgennexus.com Full catalog: apify.com/nexgendata