PyPI Package Scraper
Pricing
Pay per event
PyPI Package Scraper
Pull rich metadata for any PyPI package — current version, dependencies, classifiers, author, license, home page, download URLs, release history. Free PyPI JSON API, no key required.
Pricing
Pay per event
Rating
0.0
(0)
Developer
DevilScrapes
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
🎯 What this scrapes
PyPI publishes pypi.org/pypi/<package>/json for every package. This Actor takes a list of package names, fans them out in parallel, and writes one row per package with the version that's currently published, dependency list, classifiers (Python versions, OS, licenses, frameworks), author, project URLs, and the latest 10 release timestamps.
🔥 What we handle for you
- 🛡️ Browser fingerprint rotation —
curl-cffiimpersonates real Chrome / Firefox / Safari TLS handshakes so the target sees a browser, not Python. - 🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP on every block.
- 🔁 Retries with exponential backoff on
408 / 429 / 5xx— up to 5 attempts per page,Retry-Afterhonoured. - 🧱 Rate-limit-aware pacing — when the target pushes back, we slow down instead of getting banned.
- 🧊 Clean, typed dataset rows — Pydantic-validated, ISO-8601 timestamps, stable IDs, JSON / CSV / Excel export straight from the Apify Console.
- 💰 Pay-Per-Event pricing — you only pay for results that hit your dataset. No data, no charge.
💡 Use cases
- Dependency intel — feed your dependency tree to surface outdated/yanked packages.
- Compliance audit — pull license + classifier for every direct dependency.
- Maintainer hunting — map your stack to package authors / maintainers.
- Release watching — daily run on a watch list to alert on new releases.
⚙️ How to use it
- Click Try for free at the top of the page.
- Fill in the input form — most fields have sensible defaults.
- Click Start. Output streams into the run's dataset.
- Export from Storage → Dataset as JSON, CSV, or Excel — or fetch via the API.
📥 Input
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
packages | array | yes | ['requests', 'httpx', 'selectolax'] | List of PyPI package names. Case-insensitive — Requests and requests are equivalent on PyPI. |
includeReleases | boolean | no | True | When true, include the latest 10 release versions + dates per package. Adds no extra API call (it's in the same JSON). |
concurrency | integer | no | 8 | Parallel API requests. |
proxyConfiguration | object | no | {'useApifyProxy': False} | PyPI is generous to public clients. Proxy optional. |
Example input
{"packages": ["httpx"],"includeReleases": true,"concurrency": 4,"proxyConfiguration": {"useApifyProxy": false}}
📤 Output
Every row is one dataset item.
| Field | Type | Notes |
|---|---|---|
name | string | Canonical package name (PyPI returns the canonical casing). |
version | string | Currently-published version. |
summary | ['string', 'null'] | One-line description. |
description | ['string', 'null'] | Long description (README or similar). May be lengthy. |
description_content_type | ['string', 'null'] | Markup format of the long description. |
author | ['string', 'null'] | Author name. |
author_email | ['string', 'null'] | Author email. |
maintainer | ['string', 'null'] | Maintainer name. |
license | ['string', 'null'] | License classifier or raw text. |
home_page | ['string', 'null'] | Project home page. |
project_url | ['string', 'null'] | PyPI project URL. |
project_urls | ['object', 'null'] | Map of label → URL for additional project links. |
requires_python | ['string', 'null'] | PEP 440 marker (e.g. >=3.9). |
requires_dist | array | Runtime install_requires list. |
classifiers | array | PyPI classifiers (Programming Language, Topic, License, OS). |
keywords | ['string', 'null'] | Comma-separated keyword string from the project. |
yanked | boolean | True if the current release was yanked from the index. |
release_history | array | Recent releases — version + upload time, when includeReleases=true. |
package_url | string | Canonical pypi.org URL. |
scraped_at | string | When this row was recorded. |
Example output
{"name": "httpx","version": "0.27.0","summary": "The next generation HTTP client.","requires_python": ">=3.9","license": "BSD-3-Clause","home_page": null,"project_url": "https://pypi.org/project/httpx/"}
💰 Pricing
Pay-Per-Event — you pay only when these events fire:
| Event | USD | What it is |
|---|---|---|
actor-start | $0.005 | One-off warm-up charge per run |
result | $0.0015 | Per dataset item |
Example: 1 000 results at the rates above ≈ $1.50. No subscription, no minimum, no card to start — Apify gives every new account $5 of free credit.
🚧 Limitations
We hit PyPI's /pypi/<pkg>/json endpoint only. Download counts, vulnerability data, and reverse-dependency graphs require other sources.
❓ FAQ
Are downloads counted?
PyPI's JSON API doesn't expose download counts. For those use BigQuery's pypi-public-data dataset or Snyk's stats API.
What if a package doesn't exist?
The Actor logs a 404 and skips the row. The dataset still contains every package that did resolve.
Why is description huge?
Many packages bundle their README into the JSON. If you don't need it, post-process to drop it.
Can I get vulnerabilities?
Out of scope — use OSV or Snyk dedicated APIs.
💬 Your feedback
Spotted a bug, hit a weird edge case, or need a new field? Open an issue on the Actor's Issues tab on Apify Console — we ship fixes weekly and we read every report.