# 🛡️ PyPI Vulnerability Scraper (`taroyamada/pypi-package-intelligence`) Actor

Extract Python package metadata from PyPI and enrich it with OSV database alerts. Monitor dependencies for new version releases and critical CVE identifiers.

- **URL**: https://apify.com/taroyamada/pypi-package-intelligence.md
- **Developed by:** [太郎 山田](https://apify.com/taroyamada) (community)
- **Categories:** Developer tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $8.00 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## PyPI Package Intelligence API | Releases, Dependencies & OSV Signals

Securing your software supply chain requires more than just occasional manual checks; it demands continuous visibility into the code you rely on. This PyPI Vulnerability Scanner automates the tedious process of package due diligence by actively monitoring your Python dependencies for new version releases and known security risks. By querying the official PyPI endpoints and seamlessly enriching that data with the Open Source Vulnerability (OSV) database, this tool functions as an automated early warning system for your entire tech stack.

AppSec engineers and engineering managers use this monitor to catch compromised modules before they hit production environments. You can easily schedule daily or weekly runs to ensure no malicious update or critical CVE slips through the cracks. Instead of building complex internal web scraping tools or manually searching security pages, this scraper efficiently gathers all the necessary details into one structured dataset. Concrete outputs include specific CVE identifiers, affected version ranges, exact dependency declarations, and the latest secure release histories. Whether you are tracking a single critical framework or auditing hundreds of Python packages across multiple microservices, this data empowers you to quickly patch vulnerabilities and maintain rigorous compliance standards. Run the extraction to instantly identify which of your modules require immediate upgrades and keep your infrastructure safe from emerging threats.

### Store Quickstart

- Start with 2–5 exact package names in `packages`.
- Keep `includeDownloadStats` and `includeVulnerabilities` off for the fastest first success path, then enable them for shortlisted packages.
- Use `dryRun: true` when you only want to validate the payload shape or delivery settings.
- After the first useful run, switch to the recurring watchlist template for repeat package checks, then use the webhook handoff template for release or OSV alerts.

### Status

**V1 — Live implementation.** Scaffolded as part of Wave 6 Batch H; live collection logic implemented.

### Data sources

| Source | URL | Notes |
|---|---|---|
| Package metadata | `https://pypi.org/pypi/{package}/json` | Full metadata, all releases, latest files |
| Download stats (optional) | `https://pypistats.org/api/packages/{package}/recent` | Third-party; off by default |
| Vulnerability summary (optional) | `POST https://api.osv.dev/v1/query` | OSV advisory lookup; off by default |

### Use Cases

| Who | Why |
|---|---|
| **OSS program offices** | Audit release cadence, maintainers, and license signals before approving dependencies |
| **Security teams** | Add optional OSV summaries to triage risky packages faster |
| **Developer platform teams** | Compare PyPI libraries before standardizing on one package |
| **Analysts / investors** | Track package maturity and ecosystem traction from public signals |

### Input

| Field | Type | Default | Description |
|---|---|---|---|
| `packages` | string[] | — | **Required.** PyPI package names (e.g. `requests`). Max 100. |
| `includeReleaseHistory` | boolean | `true` | Full release version history with upload dates |
| `includeDownloadStats` | boolean | `false` | Recent download counts from pypistats.org |
| `includeVulnerabilities` | boolean | `false` | OSV vulnerability advisory summary |
| `concurrency` | integer | `5` | Parallel package fetch limit (1–10) |
| `timeoutMs` | integer | `15000` | Per-request timeout in ms |
| `delivery` | string | `dataset` | `dataset` or `webhook` |
| `webhookUrl` | string | `""` | Webhook URL when `delivery=webhook` |
| `dryRun` | boolean | `false` | Skip dataset push and webhook delivery |

### Output

Each package record contains:
- `name`, `requestedName`, `status`, `version`, `summary`, `description`
- `license`, `requiresPython`, `keywords`, `classifiers`, `requiresDist`
- `author`, `authorEmail`, `maintainer`, `maintainerEmail`
- `homePage`, `projectUrl`, `projectUrls`, `packageUrl`
- `releaseCount`, `firstRelease`, `latestRelease`
- `latestFiles` — distribution files for the latest version (filename, url, size, sha256, packageType)
- `releaseHistory` — all version upload dates (when `includeReleaseHistory=true`)
- `downloadStats` — lastDay / lastWeek / lastMonth (when `includeDownloadStats=true`)
- `vulnerabilities` — vulnCount + OSV advisory list (when `includeVulnerabilities=true`)
- `warnings` — per-package issues (yanked versions, missing fields, enrichment failures)

#### Output Example

```json
{
  "name": "requests",
  "status": "ok",
  "version": "2.32.3",
  "license": "Apache-2.0",
  "requiresPython": ">=3.8",
  "releaseCount": 180,
  "latestRelease": "2024-05-29T00:00:00.000Z",
  "downloadStats": { "lastDay": 1234567, "lastWeek": 8456789, "lastMonth": 34567890 },
  "vulnerabilities": { "vulnCount": 0, "vulns": [] },
  "warnings": []
}
````

#### Status codes

| Status | Meaning |
|---|---|
| `ok` | All requested data fetched successfully |
| `partial` | Metadata fetched but one or more optional enrichments failed |
| `not_found` | Package not found on PyPI (HTTP 404) |
| `rate_limited` | PyPI returned HTTP 429 after retries |
| `blocked` | PyPI returned HTTP 403 |
| `error` | Unexpected network or parse error |

### Known limitations

- **HTML keyword search** (`pypi.org/search/?q=...`) is JavaScript-rendered and out of scope. V1 is direct-lookup only.
- `requires_dist` strings are raw PEP 508 specifiers; environment markers are not parsed.
- Yanked releases are flagged with a per-version warning, not silently skipped.
- pypistats.org is a third-party service; treat download counts as approximate and emit warnings when unavailable.
- OSV vulnerability results are advisory summaries only — not a substitute for a full security audit.
- Package name normalization follows PEP 503 (lowercase, hyphens); a warning is emitted when the canonical name differs from the requested name.

### Local run

```bash
npm test
npm start
```

Uses `input.json` for local testing. Set `dryRun: true` to skip dataset/webhook delivery.

### Related Actors

Pair this actor with other flagship intelligence APIs in the same portfolio:

- [NPM Package Intelligence API](https://apify.com/taroyamada/npm-package-intelligence) — compare JavaScript dependency signals with the same normalized metadata approach.
- [Docker Hub Image Intelligence API](https://apify.com/taroyamada/dockerhub-image-intelligence) — add container repository context when a package also ships in public images.
- [Shopify Store Intelligence API](https://apify.com/taroyamada/shopify-store-intelligence) — connect package research to live storefront and catalog signals for platform intelligence.

### Pricing & Cost Control

Apify Store pricing is usage-based, so cost mainly follows how many packages you analyze plus any optional enrichments. Check the Store pricing card for the current per-event rates.

- Start with a shortlist of exact `packages`.
- Keep `includeDownloadStats` and `includeVulnerabilities` off for the fastest first pass.
- Use `dryRun: true` before longer shortlists or scheduled runs.
- Prefer dataset delivery while you validate downstream mappings.

### ⭐ Was this helpful?

If this actor saved you time, please [**leave a ★ rating**](https://apify.com/taroyamada/pypi-package-intelligence/reviews) on Apify Store. It takes 10 seconds, helps other developers discover it, and keeps updates free.

Bug report or feature request? Open an issue on the [Issues tab](https://apify.com/taroyamada/pypi-package-intelligence/issues) of this actor.

# Actor input Schema

## `packages` (type: `array`):

PyPI package names to fetch (e.g. 'requests', 'numpy'). Names are normalised to lowercase per PEP 503. Max 100 per run.

## `includeReleaseHistory` (type: `boolean`):

When true, includes the full release history (all version upload dates) in the output.

## `includeDownloadStats` (type: `boolean`):

When true, fetches recent download counts from pypistats.org (a third-party service). Emits a warning when unavailable.

## `includeVulnerabilities` (type: `boolean`):

When true, queries the OSV API (api.osv.dev) for known vulnerability advisories for each package. Off by default — treat results as advisory summaries only.

## `concurrency` (type: `integer`):

Number of packages to fetch in parallel. Default 5.

## `timeoutMs` (type: `integer`):

Per-request timeout in milliseconds.

## `delivery` (type: `string`):

Where to send results: dataset or webhook.

## `webhookUrl` (type: `string`):

Webhook URL to POST results to when delivery=webhook.

## `dryRun` (type: `boolean`):

Run without saving results to the dataset.

## Actor input object example

```json
{
  "packages": [
    "requests",
    "numpy"
  ],
  "includeReleaseHistory": true,
  "includeDownloadStats": false,
  "includeVulnerabilities": false,
  "concurrency": 5,
  "timeoutMs": 15000,
  "delivery": "dataset",
  "dryRun": false
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "packages": [
        "requests",
        "numpy"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("taroyamada/pypi-package-intelligence").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "packages": [
        "requests",
        "numpy",
    ] }

# Run the Actor and wait for it to finish
run = client.actor("taroyamada/pypi-package-intelligence").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "packages": [
    "requests",
    "numpy"
  ]
}' |
apify call taroyamada/pypi-package-intelligence --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=taroyamada/pypi-package-intelligence",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "🛡️ PyPI Vulnerability Scraper",
        "description": "Extract Python package metadata from PyPI and enrich it with OSV database alerts. Monitor dependencies for new version releases and critical CVE identifiers.",
        "version": "0.1",
        "x-build-id": "Bg1ncz0NNBxFtGY6h"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/taroyamada~pypi-package-intelligence/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-taroyamada-pypi-package-intelligence",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/taroyamada~pypi-package-intelligence/runs": {
            "post": {
                "operationId": "runs-sync-taroyamada-pypi-package-intelligence",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/taroyamada~pypi-package-intelligence/run-sync": {
            "post": {
                "operationId": "run-sync-taroyamada-pypi-package-intelligence",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "packages"
                ],
                "properties": {
                    "packages": {
                        "title": "Package Names",
                        "type": "array",
                        "description": "PyPI package names to fetch (e.g. 'requests', 'numpy'). Names are normalised to lowercase per PEP 503. Max 100 per run.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "includeReleaseHistory": {
                        "title": "Include Release History",
                        "type": "boolean",
                        "description": "When true, includes the full release history (all version upload dates) in the output.",
                        "default": true
                    },
                    "includeDownloadStats": {
                        "title": "Include Download Stats (pypistats.org)",
                        "type": "boolean",
                        "description": "When true, fetches recent download counts from pypistats.org (a third-party service). Emits a warning when unavailable.",
                        "default": false
                    },
                    "includeVulnerabilities": {
                        "title": "Include OSV Vulnerability Summary",
                        "type": "boolean",
                        "description": "When true, queries the OSV API (api.osv.dev) for known vulnerability advisories for each package. Off by default — treat results as advisory summaries only.",
                        "default": false
                    },
                    "concurrency": {
                        "title": "Concurrency",
                        "minimum": 1,
                        "maximum": 10,
                        "type": "integer",
                        "description": "Number of packages to fetch in parallel. Default 5.",
                        "default": 5
                    },
                    "timeoutMs": {
                        "title": "Request Timeout (ms)",
                        "minimum": 1000,
                        "maximum": 30000,
                        "type": "integer",
                        "description": "Per-request timeout in milliseconds.",
                        "default": 15000
                    },
                    "delivery": {
                        "title": "Delivery",
                        "enum": [
                            "dataset",
                            "webhook"
                        ],
                        "type": "string",
                        "description": "Where to send results: dataset or webhook.",
                        "default": "dataset"
                    },
                    "webhookUrl": {
                        "title": "Webhook URL",
                        "type": "string",
                        "description": "Webhook URL to POST results to when delivery=webhook."
                    },
                    "dryRun": {
                        "title": "Dry Run",
                        "type": "boolean",
                        "description": "Run without saving results to the dataset.",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
