Pricing

Pay per event

PyPI Metadata API Scraper

Pull rich metadata for any PyPI package via the PyPI JSON API — current version, dependencies, classifiers, author, license, home page, download URLs, release history — export to JSON or CSV. Free PyPI API, no key required, with download-stat support.

Pricing

Pay per event

Rating

0.0

(0)

Developer

DevilScrapes

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

🎯 What this scrapes

PyPI's metadata API (pypi.org/pypi/<package>/json) returns a full metadata record for every published package. This Actor accepts a list of package names, fans the requests out in parallel, and writes one typed dataset row per package — version, requires_dist, classifiers (Python versions, OS, license, framework), author, project URLs, and the latest 10 release timestamps. Bulk PyPI data export in a single run.

🔥 Features

🛡️ Browser fingerprint rotation — curl-cffi impersonates real Chrome, Firefox, and Safari TLS handshakes so the endpoint sees a browser, not a Python script. Profiles rotate across requests.
🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP on every block or rate-limit response.
🔁 Retries with exponential backoff on 408 / 429 / 5xx — up to 5 attempts per package, Retry-After respected.
🧱 Rate-limit-aware pacing — when the API pushes back we throttle automatically; you never get silently blocked or handed an empty dataset.
🧊 Clean, typed rows — Pydantic-validated, ISO-8601 timestamps, stable field names. Export to JSON, CSV, or Excel straight from the Apify Console.
💰 Pay-Per-Event pricing — you pay only for rows that hit your dataset. No data, no charge.

💡 Use cases

Dependency intel — feed your dependency tree to surface outdated or yanked packages across a full monorepo.
Compliance audit — pull license classifier and Python-version bounds for every direct dependency in one bulk PyPI data export.
Maintainer mapping — correlate your stack to authors and maintainers for supply-chain analysis.
Release monitoring — schedule a daily run on a watch list; alert your team when a new version lands.
RAG corpus — pull PyPI package READMEs in bulk for AI retrieval pipelines or language-model fine-tuning datasets.

⚙️ How to use it

Click Try for free at the top of the page.
Paste your package names into the Package names field — one per line.
Toggle Include release history if you want the latest 10 version timestamps per package.
Click Start. Output streams into the run's dataset as each package resolves.
Export from Storage → Dataset as JSON, CSV, or Excel — or pull results via the Apify API.

📥 Input

Field	Type	Required	Default	Notes
`packages`	`array`	yes	`['requests', 'httpx', 'selectolax']`	PyPI package names. Case-insensitive — `Requests` and `requests` resolve the same record.
`includeReleases`	`boolean`	no	`true`	When true, appends the latest 10 release versions + upload dates per package. No extra API call needed.
`concurrency`	`integer`	no	`8`	Parallel API requests. Raise for large lists; lower if you see 429s on a shared IP.
`proxyConfiguration`	`object`	no	`{"useApifyProxy": false}`	Proxy settings. Leave off for small lists; enable residential proxies for bulk runs.

Example input

{
  "packages": [
    "httpx"
  ],
  "includeReleases": true,
  "concurrency": 4,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}

📤 Output

Every row is one dataset item corresponding to one PyPI package.

Field	Type	Notes
`name`	`string`	Canonical package name (PyPI returns the canonical casing).
`version`	`string`	Currently-published version.
`summary`	`string \| null`	One-line description.
`description`	`string \| null`	Long description (README or similar). May be large.
`description_content_type`	`string \| null`	Markup format of the long description.
`author`	`string \| null`	Author name.
`author_email`	`string \| null`	Author email.
`maintainer`	`string \| null`	Maintainer name.
`license`	`string \| null`	License classifier or raw license text.
`home_page`	`string \| null`	Project home page URL.
`project_url`	`string \| null`	PyPI project URL.
`project_urls`	`object \| null`	Map of label → URL for additional project links (Source, Bug Tracker, etc.).
`requires_python`	`string \| null`	PEP 440 marker, e.g. `>=3.9`.
`requires_dist`	`array`	Runtime `install_requires` dependency list.
`classifiers`	`array`	PyPI classifiers — Programming Language, Topic, License, OS.
`keywords`	`string \| null`	Comma-separated keyword string from the project metadata.
`yanked`	`boolean`	`true` if the current release was yanked from the index.
`release_history`	`array`	Recent releases — version + upload timestamp — when `includeReleases` is `true`.
`package_url`	`string`	Canonical `pypi.org` URL for the package.
`scraped_at`	`string`	ISO-8601 timestamp of when this row was recorded.

Example output

{
  "name": "httpx",
  "version": "0.27.0",
  "summary": "The next generation HTTP client.",
  "requires_python": ">=3.9",
  "license": "BSD-3-Clause",
  "home_page": null,
  "project_url": "https://pypi.org/project/httpx/"
}

💰 Pricing

Pay-Per-Event — you pay only when these events fire:

Event	USD	What it is
`actor-start`	$0.005	One-off warm-up charge per run
`result`	$0.0015	Per dataset row written

1 000 results at these rates ≈ $1.50. No subscription, no minimum commitment, no card required to start — every new Apify account gets $5 of free credit.

Also see our npm-package-scraper — identical pricing, same bulk-export pattern for the npm ecosystem.

🚧 Limitations

This Actor queries the PyPI metadata API (/pypi/<pkg>/json) endpoint only. Download counts, vulnerability scores, and reverse-dependency graphs live elsewhere — see pypistats.org, OSV, and Snyk for those. Classifier data is only as complete as what individual maintainers have filed with PyPI; we return whatever the API provides and never fabricate missing values.

❓ FAQ

What is the PyPI metadata API?

PyPI exposes a free, undocumented-for-bulk-use JSON endpoint at pypi.org/pypi/<package>/json. It returns the full metadata record for a published package: version info, classifiers, dependencies, author details, and release history. This Actor wraps that endpoint with retry logic, proxy rotation, and bulk throughput so you can pull thousands of records in a single run.

Are download counts included?

The PyPI metadata API does not expose download counts. For historical download statistics use BigQuery's pypi-public-data public dataset or the pypistats.org API — those are separate surfaces and not in scope here.

What happens if a package name doesn't exist?

The Actor logs a 404 and skips that package. The dataset still contains every package that resolved successfully; the run log lists skipped names.

Why is the description field so large?

Many maintainers bundle their full README into the PyPI package metadata. If your pipeline doesn't need the long description, drop the field in post-processing or filter it in your Apify dataset view.

Can I get vulnerability or CVE data?

That is out of scope for this Actor. Use the OSV API (osv.dev) or Snyk's advisory database for vulnerability enrichment; they publish dedicated APIs built for that purpose.

How do I do a bulk PyPI data export for my whole requirements.txt?

Paste your package list into the Package names field. For large lists (1 000+ packages) raise concurrency to 16 and consider enabling Apify residential proxies to distribute the load. At $1.50 / 1 000 results the cost scales linearly.

💬 Your feedback

Spotted a bug, hit an unexpected edge case, or need a field the Actor doesn't currently return? Open an issue on the Actor's Issues tab in the Apify Console — we ship fixes weekly and read every report.

PyPI Package Metadata Scraper

klondikeking/pypi-package-scraper

Extract detailed metadata for Python packages from the PyPI public API. Get version, dependencies, classifiers, license, author info, and release history for any PyPI package.

Pierrick McD0nald

Pypi Scraper

fortuitous_pirate/pypi-scraper

Scrape Python package data from PyPI: package info, versions, dependencies, download stats, classifiers, and release history. Free official PyPI JSON API.

Fortuitous Pirate

PyPI Package Scraper

ef12/pypi-package-scraper

Search PyPI packages, get package metadata, download counts, and version history using the free PyPI JSON API. Track trending packages by download volume.

Daniel Wilson

PyPI Scraper - Python Package Data

benthepythondev/pypi-scraper

Look up Python packages on PyPI and get clean metadata: version, summary, author, license, dependencies, classifiers, project links and release history. Fast and reliable via the public PyPI JSON API, no key. Bulk-process any list of package names.

Ben

PyPI Package Data — Metadata & Download Stats

omao/pypi-package

Get PyPI package data: version, summary, license, author, repository, Python requirement, dependencies, classifiers and recent download counts. Powered by the official PyPI and pypistats APIs. No API key, no anti-bot.

Marouane Oulabass

PyPI Package Tracker — Metadata + Stats

v0iddo/pypi-package-tracker

Snapshot PyPI package metadata using the public pypi.org/pypi/<pkg>/json endpoint. One row per package with version, classifiers, requires_python, maintainers, project URLs, release count.

vøiddo

PyPI Package Stats Scraper — Downloads, Versions, Dependencies

seemuapps/pypi-package-stats-scraper

Get download counts, version history, dependencies, license, author, and classifiers for any Python package on PyPI. Bulk-process a list of packages in one run.

Andrew

PyPI Python Package Scraper

cloud9_ai/pypi-package-scraper

Search and extract Python package data from PyPI. Get versions, dependencies, download stats, and classifiers. No API key needed.

cloud9

PyPI Package Metadata Scraper

chrisp1211/pypi-scraper-max

PyPI Package Metadata Scraper. No key. $0 on failed or empty runs.

Christian Pichichero

PyPI Package Scraper — Downloads, Deps & Metadata

logiover/pypi-package-scraper

Scrape PyPI by package list or top-N packages. Extract version, license, author, dependencies, release date, and recent downloads. No API key, no login required.