PyPI Package Scraper avatar

PyPI Package Scraper

Pricing

Pay per event

Go to Apify Store
PyPI Package Scraper

PyPI Package Scraper

Pull rich metadata for any PyPI package — current version, dependencies, classifiers, author, license, home page, download URLs, release history. Free PyPI JSON API, no key required.

Pricing

Pay per event

Rating

0.0

(0)

Developer

DevilScrapes

DevilScrapes

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Categories

Share


🎯 What this scrapes

PyPI publishes pypi.org/pypi/<package>/json for every package. This Actor takes a list of package names, fans them out in parallel, and writes one row per package with the version that's currently published, dependency list, classifiers (Python versions, OS, licenses, frameworks), author, project URLs, and the latest 10 release timestamps.

🔥 What we handle for you

  • 🛡️ Browser fingerprint rotationcurl-cffi impersonates real Chrome / Firefox / Safari TLS handshakes so the target sees a browser, not Python.
  • 🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP on every block.
  • 🔁 Retries with exponential backoff on 408 / 429 / 5xx — up to 5 attempts per page, Retry-After honoured.
  • 🧱 Rate-limit-aware pacing — when the target pushes back, we slow down instead of getting banned.
  • 🧊 Clean, typed dataset rows — Pydantic-validated, ISO-8601 timestamps, stable IDs, JSON / CSV / Excel export straight from the Apify Console.
  • 💰 Pay-Per-Event pricing — you only pay for results that hit your dataset. No data, no charge.

💡 Use cases

  • Dependency intel — feed your dependency tree to surface outdated/yanked packages.
  • Compliance audit — pull license + classifier for every direct dependency.
  • Maintainer hunting — map your stack to package authors / maintainers.
  • Release watching — daily run on a watch list to alert on new releases.

⚙️ How to use it

  1. Click Try for free at the top of the page.
  2. Fill in the input form — most fields have sensible defaults.
  3. Click Start. Output streams into the run's dataset.
  4. Export from Storage → Dataset as JSON, CSV, or Excel — or fetch via the API.

📥 Input

FieldTypeRequiredDefaultNotes
packagesarrayyes['requests', 'httpx', 'selectolax']List of PyPI package names. Case-insensitive — Requests and requests are equivalent on PyPI.
includeReleasesbooleannoTrueWhen true, include the latest 10 release versions + dates per package. Adds no extra API call (it's in the same JSON).
concurrencyintegerno8Parallel API requests.
proxyConfigurationobjectno{'useApifyProxy': False}PyPI is generous to public clients. Proxy optional.

Example input

{
"packages": [
"httpx"
],
"includeReleases": true,
"concurrency": 4,
"proxyConfiguration": {
"useApifyProxy": false
}
}

📤 Output

Every row is one dataset item.

FieldTypeNotes
namestringCanonical package name (PyPI returns the canonical casing).
versionstringCurrently-published version.
summary['string', 'null']One-line description.
description['string', 'null']Long description (README or similar). May be lengthy.
description_content_type['string', 'null']Markup format of the long description.
author['string', 'null']Author name.
author_email['string', 'null']Author email.
maintainer['string', 'null']Maintainer name.
license['string', 'null']License classifier or raw text.
home_page['string', 'null']Project home page.
project_url['string', 'null']PyPI project URL.
project_urls['object', 'null']Map of label → URL for additional project links.
requires_python['string', 'null']PEP 440 marker (e.g. >=3.9).
requires_distarrayRuntime install_requires list.
classifiersarrayPyPI classifiers (Programming Language, Topic, License, OS).
keywords['string', 'null']Comma-separated keyword string from the project.
yankedbooleanTrue if the current release was yanked from the index.
release_historyarrayRecent releases — version + upload time, when includeReleases=true.
package_urlstringCanonical pypi.org URL.
scraped_atstringWhen this row was recorded.

Example output

{
"name": "httpx",
"version": "0.27.0",
"summary": "The next generation HTTP client.",
"requires_python": ">=3.9",
"license": "BSD-3-Clause",
"home_page": null,
"project_url": "https://pypi.org/project/httpx/"
}

💰 Pricing

Pay-Per-Event — you pay only when these events fire:

EventUSDWhat it is
actor-start$0.005One-off warm-up charge per run
result$0.0015Per dataset item

Example: 1 000 results at the rates above ≈ $1.50. No subscription, no minimum, no card to start — Apify gives every new account $5 of free credit.

🚧 Limitations

We hit PyPI's /pypi/<pkg>/json endpoint only. Download counts, vulnerability data, and reverse-dependency graphs require other sources.

❓ FAQ

Are downloads counted?

PyPI's JSON API doesn't expose download counts. For those use BigQuery's pypi-public-data dataset or Snyk's stats API.

What if a package doesn't exist?

The Actor logs a 404 and skips the row. The dataset still contains every package that did resolve.

Why is description huge?

Many packages bundle their README into the JSON. If you don't need it, post-process to drop it.

Can I get vulnerabilities?

Out of scope — use OSV or Snyk dedicated APIs.

💬 Your feedback

Spotted a bug, hit a weird edge case, or need a new field? Open an issue on the Actor's Issues tab on Apify Console — we ship fixes weekly and we read every report.