π PyPI Scraper β Python Package Data
Pricing
from $3.00 / 1,000 results
π PyPI Scraper β Python Package Data
Extract Python package data from PyPI β download stats, dependencies, version history & maintainers. Build Python ecosystem analytics, dependency audits & monitoring dashboards. Pay per package.
Pricing
from $3.00 / 1,000 results
Rating
0.0
(0)
Developer
Stephan Corbeil
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
0
Monthly active users
a day ago
Last modified
Categories
Share
π PyPI Package Scraper β Python Package Metadata, Versions & Downloads
Bulk-extract Python package metadata from the PyPI registry: name, summary, latest version, version history, dependencies (with extras), classifiers, project URLs, license, maintainers, and download stats from the BigQuery public dataset. A pay-per-result alternative to libraries.io API, Snyk Advisor, PyPiStats, and OSS Insight β designed for Python tooling founders, ML-platform teams sizing model adoption, security teams auditing supply chain, and DevRel measuring library traction.
Why PyPI Package Scraper Beats libraries.io, Snyk Advisor, PyPiStats & OSS Insight
| Feature | NexGenData PyPI Scraper | libraries.io API | Snyk Advisor | PyPiStats | OSS Insight |
|---|---|---|---|---|---|
| Cost | $2 per 1K packages, pay-per-event | Free (heavy throttle) | $25-99 / dev / month | Free (HTML only) | $$ / month |
| Version history with dates | Yes | Yes | Limited | No | Limited |
| Classifiers + extras | Yes | Partial | No | No | No |
| Download counts (monthly) | Yes β via BQ public dataset | Yes | No | Yes | Yes |
| Project URLs (homepage / repo / docs) | Yes | Yes | Yes | No | Yes |
| Bulk export | JSON / CSV / Excel | Plan-gated | CSV | HTML scrape only | Plan-gated |
| Auth | Apify token | API key | Snyk account | None | OSS Insight account |
| Monthly minimum | None | None | $25+ | None | Plan-based |
Most Python-ecosystem teams pick this actor instead of hand-rolling a PyPI JSON-API harvester because it is a drop-in alternative to libraries.io with no rate-limit pain, cheaper than Snyk Advisor for non-vuln workflows, and packages download stats together with metadata in a single dataset row β saving you a separate PyPiStats lookup per package.
What You Get Per Package
Each dataset item is a flat record:
name,summary,descriptionlatest_version,latest_release_dateversions[]β every release with{version, published_at, yanked}requires_dist[]β declared dependencies including extrasrequires_pythonβ version constraintclassifiers[]β Trove classifiers (e.g. "Programming Language :: Python :: 3.12")project_urlsβ{Homepage, Documentation, Source, Bug Tracker, ...}home_page,licenseauthor,author_email,maintainer,maintainer_emailkeywordsdownloadsβ{last_day, last_week, last_month}wheel_count,sdist_availablevulnerabilities[]β known PyPI advisories if any
Use Cases
- ML-platform founders β measure the adoption of
mlflowvswandbvsclearmlby sorting ondownloads.last_month - Internal-tools teams β audit a private requirements.txt against PyPI to flag yanked, abandoned, or vulnerable packages
- Data-engineering managers β generate a license-compliance report (
licensefield) across every package the team uses - VC analysts β find breakout Python data tools by ranking on month-over-month download growth
- Open-source maintainers β track downstream
requires_distreferences to your library to size the ecosystem - Security teams β pull
vulnerabilities[]for every package in a Software Bill of Materials (SBOM)
Quick Start
from apify_client import ApifyClientclient = ApifyClient("YOUR_APIFY_TOKEN")run = client.actor("nexgendata/pypi-scraper").call(run_input={"packages": ["requests", "pandas", "fastapi", "pydantic", "polars"]})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(item["name"], item["latest_version"], item["downloads"]["last_month"])
Pricing
Pay-per-event:
- Actor Start: small fixed charge per run (memory-scaled)
- Per package: $2 per 1,000 packages returned
No subscription, no minimum.
Related NexGenData Actors
| Use case | Actor |
|---|---|
| npm package metadata scraper | npm-scraper |
| PyPI package download statistics | pypi-package-stats |
| npm package download statistics | npm-package-stats |
| GitHub trending repositories | github-trending-repos |
| GitHub repository deep-stats | github-repo-stats |
| Stack Overflow Q&A scraper | stackoverflow-questions |
| Developer tools intelligence MCP | developer-tools-mcp-server |
| Hacker News scraper | hacker-news-scraper |
FAQ
How are download counts calculated?
PyPI publishes anonymized download logs to a BigQuery public dataset. We aggregate the last 1 / 7 / 30 days per package, which matches pypistats exactly.
Are pre-release versions included?
Yes β the full versions[] array includes alphas, betas, and release candidates. Filter on the suffix in your code if you only want stable.
Does the actor follow PyPI Warehouse JSON API?
Yes, it uses the official pypi.org/pypi/<pkg>/json endpoint plus the simple-index for completeness.
Output formats? JSON, CSV, Excel, and the Apify dataset API.
Is this legal? Yes β PyPI metadata is public.
About NexGenData
NexGenData publishes 260+ buyer-intent actors covering SEC filings, YC alumni, lead generation, competitive intelligence, stock fundamentals across 30+ exchanges, and more. All pay-per-result. Browse the full catalog at https://apify.com/nexgendata?fpr=2ayu9b
How NexGenData Pricing Works
Every NexGenData actor uses pay-per-event pricing β you only pay for results that actually land in your dataset. No monthly minimum, no seat fees, no surprise overage bills.
- Actor Start: a single-event charge each time you spin the actor up (scaled to memory size)
- Result / item: charged per item written to the default dataset
- No charge for retries, internal proxy rotation, or failed sub-requests β those are absorbed by the platform
Apify Platform Bonus
New to Apify? Sign up with the NexGenData referral link β you get free platform credits on signup (enough for several thousand free results) and you help fund the maintenance of this actor fleet.
Integration Surface
Every actor in the NexGenData catalog can be triggered from:
- Apify console β point-and-click run
- Apify API β REST + webhooks
- Apify Python / JS SDKs β programmatic batch
- Zapier, Make.com, n8n β official integrations
- MCP β many actors are exposed as MCP tools for Claude / ChatGPT / Cursor agents
- Schedules β built-in cron for daily / weekly / monthly runs
- Webhooks β POST results to any HTTPS endpoint on dataset write
Support
NexGenData maintains 260+ Apify actors and ships updates regularly. Bug reports via the Apify console issues tab get a response within 24 hours. Roadmap requests are welcome β high-demand features ship in the next version.
Home: thenextgennexus.com Full catalog: apify.com/nexgendata
