PyPI Vulnerability Scraper avatar

PyPI Vulnerability Scraper

Pricing

from $8.00 / 1,000 results

Go to Apify Store
PyPI Vulnerability Scraper

PyPI Vulnerability Scraper

Extract Python package metadata from PyPI and enrich it with OSV database alerts. Monitor dependencies for new version releases and critical CVE identifiers.

Pricing

from $8.00 / 1,000 results

Rating

0.0

(0)

Developer

太郎 山田

太郎 山田

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

5 days ago

Last modified

Categories

Share

PyPI Package Intelligence API | Releases, Dependencies & OSV Signals

Securing your software supply chain requires continuous visibility into the code your applications rely on. This PyPI vulnerability scraper automates the tedious process of package due diligence by actively extracting data from your Python dependencies for new version releases and known security risks. By querying the official PyPI endpoints and seamlessly enriching that scraped data with the Open Source Vulnerability (OSV) database, this extraction tool functions as an automated early warning system for your entire tech stack. AppSec engineers, DevOps teams, and engineering managers use this scraper to catch compromised modules before they deploy to production environments. You can easily schedule daily or weekly automated runs to ensure no malicious update or critical CVE slips through the cracks of your CI/CD pipeline. Instead of building complex internal web scraping tools or manually checking Python security pages, this scraper efficiently gathers all the necessary details into one structured dataset. Concrete outputs include specific CVE identifiers, affected version ranges, exact dependency declarations, and the latest secure release histories. By tracking the exact package details over time, you build a robust defense against supply chain attacks, ensuring your Python applications remain secure and up-to-date with minimal manual intervention.

Store Quickstart

  • Start with 2–5 exact package names in packages.
  • Keep includeDownloadStats and includeVulnerabilities off for the fastest first success path, then enable them for shortlisted packages.
  • Use dryRun: true when you only want to validate the payload shape or delivery settings.
  • After the first useful run, switch to the recurring watchlist template for repeat package checks, then use the webhook handoff template for release or OSV alerts.

Status

V1 — Live implementation. Scaffolded as part of Wave 6 Batch H; live collection logic implemented.

Data sources

SourceURLNotes
Package metadatahttps://pypi.org/pypi/{package}/jsonFull metadata, all releases, latest files
Download stats (optional)https://pypistats.org/api/packages/{package}/recentThird-party; off by default
Vulnerability summary (optional)POST https://api.osv.dev/v1/queryOSV advisory lookup; off by default

Use Cases

WhoWhy
OSS program officesAudit release cadence, maintainers, and license signals before approving dependencies
Security teamsAdd optional OSV summaries to triage risky packages faster
Developer platform teamsCompare PyPI libraries before standardizing on one package
Analysts / investorsTrack package maturity and ecosystem traction from public signals

Input

FieldTypeDefaultDescription
packagesstring[]Required. PyPI package names (e.g. requests). Max 100.
includeReleaseHistorybooleantrueFull release version history with upload dates
includeDownloadStatsbooleanfalseRecent download counts from pypistats.org
includeVulnerabilitiesbooleanfalseOSV vulnerability advisory summary
concurrencyinteger5Parallel package fetch limit (1–10)
timeoutMsinteger15000Per-request timeout in ms
deliverystringdatasetdataset or webhook
webhookUrlstring""Webhook URL when delivery=webhook
dryRunbooleanfalseSkip dataset push and webhook delivery

Input Examples

Example: Single-target audit

{
"targets": [
"example-target-1"
],
"maxResultsPerTarget": 30
}

Example: Bulk portfolio

{
"targets": [
"target-1",
"target-2",
"target-3"
],
"maxResultsPerTarget": 50,
"snapshotKey": "pypi-package-intelligence-state"
}

Example: Recurring delta watch

{
"targets": [
"target-1"
],
"snapshotKey": "pypi-package-intelligence-state",
"emitChangedOnly": true
}

Output

Each package record contains:

  • name, requestedName, status, version, summary, description
  • license, requiresPython, keywords, classifiers, requiresDist
  • author, authorEmail, maintainer, maintainerEmail
  • homePage, projectUrl, projectUrls, packageUrl
  • releaseCount, firstRelease, latestRelease
  • latestFiles — distribution files for the latest version (filename, url, size, sha256, packageType)
  • releaseHistory — all version upload dates (when includeReleaseHistory=true)
  • downloadStats — lastDay / lastWeek / lastMonth (when includeDownloadStats=true)
  • vulnerabilities — vulnCount + OSV advisory list (when includeVulnerabilities=true)
  • warnings — per-package issues (yanked versions, missing fields, enrichment failures)

Output Example

{
"name": "requests",
"status": "ok",
"version": "2.32.3",
"license": "Apache-2.0",
"requiresPython": ">=3.8",
"releaseCount": 180,
"latestRelease": "2024-05-29T00:00:00.000Z",
"downloadStats": { "lastDay": 1234567, "lastWeek": 8456789, "lastMonth": 34567890 },
"vulnerabilities": { "vulnCount": 0, "vulns": [] },
"warnings": []
}

Status codes

StatusMeaning
okAll requested data fetched successfully
partialMetadata fetched but one or more optional enrichments failed
not_foundPackage not found on PyPI (HTTP 404)
rate_limitedPyPI returned HTTP 429 after retries
blockedPyPI returned HTTP 403
errorUnexpected network or parse error

Known limitations

  • HTML keyword search (pypi.org/search/?q=...) is JavaScript-rendered and out of scope. V1 is direct-lookup only.
  • requires_dist strings are raw PEP 508 specifiers; environment markers are not parsed.
  • Yanked releases are flagged with a per-version warning, not silently skipped.
  • pypistats.org is a third-party service; treat download counts as approximate and emit warnings when unavailable.
  • OSV vulnerability results are advisory summaries only — not a substitute for a full security audit.
  • Package name normalization follows PEP 503 (lowercase, hyphens); a warning is emitted when the canonical name differs from the requested name.

Local run

npm test
npm start

Uses input.json for local testing. Set dryRun: true to skip dataset/webhook delivery.

Pair this actor with other flagship intelligence APIs in the same portfolio:

Pricing & Cost Control

Apify Store pricing is usage-based, so cost mainly follows how many packages you analyze plus any optional enrichments. Check the Store pricing card for the current per-event rates.

  • Start with a shortlist of exact packages.
  • Keep includeDownloadStats and includeVulnerabilities off for the fastest first pass.
  • Use dryRun: true before longer shortlists or scheduled runs.
  • Prefer dataset delivery while you validate downstream mappings.

⭐ Was this helpful?

If this actor saved you time, please leave a ★ rating on Apify Store. It takes 10 seconds, helps other developers discover it, and keeps updates free.

Bug report or feature request? Open an issue on the Issues tab of this actor.