PyPI Scraper
Pricing
from $1.00 / 1,000 results
PyPI Scraper
Scrape Python package metadata from PyPI: exact-name lookup, newly-added packages, and recently-updated packages. Pulls version, license, classifiers, dependencies, project URLs, and maintainer info.
Pricing
from $1.00 / 1,000 results
Rating
5.0
(10)
Developer
Crawler Bros
Maintained by CommunityActor stats
10
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
Scrape Python package metadata from the PyPI Python Package Index — exact-name lookup, newly-added packages, and recently-updated packages. Pulls version, license, classifiers, dependencies, project URLs, author/maintainer info, and latest artifact details. HTTP-only via PyPI's public JSON + RSS endpoints. No auth, no proxy.
What this actor does
- Three modes:
lookup(exact package names),newest(RSS feed of newly-added packages),updates(RSS feed of recently-updated package versions) - Rich metadata: name, latest version, summary, full description (markdown), license, classifiers, keywords, requires_python, project_urls (Documentation / Source / Issues / etc.)
- Artifacts of latest release: filename, package type (wheel / sdist), python version, URL, size, upload time
- Filters: classifier any-of, license substring, minimum supported Python version
- Optional:
includeReleases(full version history),includeUrls(project_urls map) - Empty fields are omitted — no nulls / blank strings reach the dataset
Output per package
name,latestVersion,summary,description,descriptionContentTypelicense— preferslicense_expression(SPDX) then falls back tolicensehomePage,downloadUrl,docsUrl,bugTrackUrlrequiresPython(e.g.>=3.8,<4)keywords[]— auto-detects comma vs space separatorclassifiers[]— full list of PyPI classifiersauthor—{name, email}maintainer—{name, email}(when present)projectUrls—{Documentation, Source, Issues, ...}map (whenincludeUrls=true)requiresDist[]— runtime dependency specifierslatestArtifacts[]—[{filename, packageType, pythonVersion, url, size, uploadTime}, ...]versions[]— sorted (reverse) list of release versions whenincludeReleases=truevulnerabilityCount— number of known vulnerabilities (PyPI's reported list)pypiUrl,pypiJsonUrlrecordType: "package",scrapedAt
Input
| Field | Type | Default | Description |
|---|---|---|---|
mode | string | newest | lookup / newest / updates |
packageNames | array | – | Required for mode=lookup (e.g. ["requests", "numpy"]) |
classifierAnyOf | array | [] | Only emit packages with at least one of these classifiers |
licenseContains | string | – | Only emit packages whose license contains this substring (case-insensitive) |
minPythonVersion | string | – | e.g. 3.10 — only packages whose requires_python allows this version |
includeReleases | bool | false | Emit full version history |
includeUrls | bool | true | Emit project_urls map |
maxItems | int | 50 | Hard cap (1–1000) |
Example: lookup specific packages
{"mode": "lookup","packageNames": ["requests", "numpy", "fastapi"]}
Example: newest packages on PyPI
{"mode": "newest","maxItems": 30}
Example: recently updated packages, MIT-licensed only
{"mode": "updates","licenseContains": "MIT","minPythonVersion": "3.10","maxItems": 50}
Example: tracking SQL ORM packages
{"mode": "lookup","packageNames": ["sqlalchemy", "tortoise-orm", "peewee", "pony"],"classifierAnyOf": ["Topic :: Database"],"includeReleases": true}
Use cases
- Open-source intelligence — track adoption / version cadence of Python packages
- Security teams — track maintainer churn, monitor
vulnerabilityCount, audit licenses - DevRel & growth — find similar / competing packages, monitor share of voice
- Compliance — bulk-fetch SPDX license expressions across an entire dependency tree
- Package discovery — find newly-published packages in your domain
- Release monitoring — wire up the
updatesfeed to alert on new releases of watched packages
FAQ
Why no search mode? PyPI removed the JSON search API in 2024. There's no longer a programmatic search endpoint that returns structured JSON. Use lookup for known names, newest / updates for new-package discovery, or filter the RSS feeds via classifier / license / Python-version filters.
Why are RSS modes so much faster than search? Each RSS feed call returns up to 40 items in one request. The actor then fetches each package's pypi.org/pypi/<name>/json to enrich. So newest mode = 1 RSS call + N package calls.
What's license vs license_expression? license is free-form (often MIT License, Apache 2.0). license_expression is the SPDX identifier (e.g. Apache-2.0). The actor prefers license_expression if present.
How does minPythonVersion work? It parses requires_python (e.g. >=3.8,<4), extracts the lowest required version, and checks if your threshold is >= that. So minPythonVersion: "3.10" keeps packages that support Python 3.10 (i.e. requires_python lower bound ≤ 3.10).
What does vulnerabilityCount track? PyPI exposes a vulnerabilities array on each package's JSON payload (sourced from OSV.dev). We count entries — a non-zero count is a signal to dig deeper.
Are dependencies fully resolved? No — requiresDist returns the raw specifiers (e.g. "urllib3<3,>=1.21.1"). For full resolution, feed the package into pip-compile or uv lock downstream.
How fresh is the data? Real-time. PyPI's JSON is served from the same backend that serves the website; RSS feeds update every few minutes.