PyPI Scraper - Python Package Search & Stats avatar

PyPI Scraper - Python Package Search & Stats

Pricing

from $19.00 / 1,000 results

Go to Apify Store
PyPI Scraper - Python Package Search & Stats

PyPI Scraper - Python Package Search & Stats

Search and scrape Python package data from PyPI including versions, authors, licenses, keywords, download stats, and classifiers. Export to CSV, Excel, JSON, XML.

Pricing

from $19.00 / 1,000 results

Rating

0.0

(0)

Developer

ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Share

ParseForge Banner

🐍 PyPI Python Package Scraper

🚀 Export Python package data from PyPI in seconds. Search 812,000+ packages by keyword and get version, author, license, keywords, download stats, classifiers, and more. No API key, no registration required.

🕒 Last updated: 2026-05-21 · 📊 15 fields per record · 🐍 812,000+ packages · 🌍 Open Python Package Index

The PyPI Scraper searches the Python Package Index and returns 15 fields per record including name, version, summary, author, license, homepage, repository URL, keywords, weekly and monthly download counts, Python version requirements, classifiers, and publish date. The underlying data comes directly from PyPI's public JSON API and is the same catalog used by pip install.

The index covers every publicly released Python package - from the most downloaded frameworks like Django, Flask, and NumPy down to niche utilities and personal projects. This Actor searches by keyword, resolves full metadata for each match, and enriches with real download statistics. Your dataset is ready to download as CSV, Excel, JSON, or XML in under a minute.

🎯 Target Audience💡 Primary Use Cases
Data engineers, Python developers, package analysts, security researchers, DevOps teams, BI analystsDependency auditing, license compliance, tech-stack research, competitive analysis, package discovery

📋 What the PyPI Scraper does

Five research workflows in a single run:

  • 🔍 Keyword search. Find packages matching any search term across package names - "machine learning", "web scraping", "django", "data pipeline", "cli".
  • 📦 Full metadata extraction. Name, version, summary, author, license, homepage, and repository URL from the PyPI JSON API.
  • 📊 Download statistics. Weekly and monthly download counts from the PyPI Stats API - instantly spot popular vs. niche packages.
  • 🏷️ Classifier taxonomy. Full PyPI classifier list per package - development status, programming language versions, OS compatibility, topics.
  • 📅 Freshness signals. Last publish date and Python version requirement for every record.

💡 Why it matters: The Python ecosystem has over 812,000 packages on PyPI. Manually auditing which libraries match a topic, what their license is, and how actively they are maintained is hours of work. This Actor delivers a structured dataset in under a minute - ready to import into Notion, Airtable, BigQuery, or your CI pipeline.


🎬 Full Demo

🚧 Coming soon: a 2-minute walkthrough showing how to go from sign-up to a downloaded dataset.


⚙️ Input

InputTypeDefaultBehavior
searchstring"web scraping"Keyword to search across PyPI package names. Supports multi-word queries.
maxItemsinteger10Records to return. Free plan caps at 10, paid plan at 1,000,000.

Example: find web scraping libraries.

{
"search": "web scraping",
"maxItems": 50
}

Example: find machine learning tools.

{
"search": "machine learning",
"maxItems": 100
}

⚠️ Good to Know: search matches package names. Packages with the exact query phrase in their name rank highest, followed by packages containing any word from the query. Download statistics for very new packages (published in the last few days) may return null while pypistats.org catches up. All other fields come directly from the PyPI JSON API and are always current.


📊 Output

Each package record contains 15 fields. Download the dataset as CSV, Excel, JSON, or XML.

🧾 Schema

FieldTypeExample
📦 namestring"scrapy"
🔢 versionstring"2.12.0"
📝 summarystring | null"A high-level web crawling and scraping framework..."
👤 authorstring | null"Scrapy developers"
⚖️ licensestring | null"BSD"
🔗 homepagestring | null"https://scrapy.org/"
🗂️ repositorystring | null"https://github.com/scrapy/scrapy"
🏷️ keywordsarray | null["scraping", "crawling", "spider"]
📊 weeklyDownloadsinteger | null892341
📈 totalDownloadsinteger | null3847291
🐍 requiresPythonstring | null">=3.9"
🗂️ classifiersarray | null["Development Status :: 5 - Production/Stable", "..."]
📅 lastPublishedISO 8601 | null"2025-10-16T15:01:47"
🔗 urlstring"https://pypi.org/project/scrapy/"
🕒 scrapedAtISO 8601"2026-05-21T23:29:13.858Z"

📦 Sample records


✨ Why choose this Actor

FeatureBenefit
🌐 No API key neededWorks immediately - no registration, no quotas, no auth setup
📦 812,000+ packagesSearches the full PyPI catalog - every public Python library ever released
📊 Real download statsWeekly and monthly counts from pypistats.org, not cached estimates
⚖️ License fieldExtracted from both the license field and classifiers as fallback
🏷️ Full classifier taxonomyDevelopment status, Python versions, OS, topic - all in one array
🔗 Direct PyPI URLsEach record links directly to the package page for instant verification
🚀 Fast5 records in under 3 seconds, 100 records in under 60 seconds
🔄 Always liveNo caches - hits the PyPI JSON API on every run

📈 How it compares to alternatives

MethodPackagesDownload StatsLicenseClassifiersSpeed
PyPI Scraper (this)812,000+YesYesYesSeconds
Manual PyPI browsingManual onlyVisibleVisibleVisibleHours
pip searchDisabled by PyPINoNoNoN/A
PyPI XML-RPC searchDisabled by PyPINoNoNoN/A
Libraries.io APIYes (limited free)YesYesPartialRequires key

🚀 How to use

  1. Create a free account - includes $5 free credit.
  2. Open PyPI Python Package Scraper in the Apify Store.
  3. Enter your search query (e.g. "machine learning", "django", "data pipeline").
  4. Set Max Items (free plan preview: 10, paid: up to 1,000,000).
  5. Click Start and wait a few seconds.
  6. Download your dataset as CSV, Excel, JSON, or XML.

💼 Business use cases

1. Dependency auditing and license compliance

Enterprises using Python in production need to track every open-source library's license. Run this Actor with relevant library names to build a license inventory - MIT, Apache 2.0, GPL, BSD - and flag packages with restrictive licenses before they enter your codebase.

2. Competitive intelligence for developer tools

If you build a Python library, tool, or SaaS product targeting Python developers, track competing packages: their version cadence, download trajectories, classifiers, and repository links. Spot emerging rivals before they go viral.

3. Security and supply-chain research

Security teams monitor PyPI for typosquatting and malicious packages. Use this Actor to scan packages matching a keyword pattern, compare author names, and flag anomalous entries that shadow popular libraries with nearly identical names.

4. Package discovery for data science teams

Data science leads often onboard new team members who need a curated list of relevant packages for a domain - "NLP", "time series", "geospatial". Run the Actor with a domain keyword and share the CSV as a starting library inventory.


🔌 Automating PyPI Scraper

Connect this Actor to your existing stack using Apify's built-in integrations:

  • Make (Integromat) - trigger a run on a schedule, send results to Google Sheets or Notion
  • Zapier - auto-export new package data to Airtable, Slack, or email
  • Slack - post new package results directly to a channel
  • Google Sheets - append rows with every new run via the Apify Google Sheets integration
  • Webhooks - push dataset items to any REST endpoint on run completion
  • Apify API - call from Python, Node.js, or any HTTP client with your API token

🌟 Beyond business use cases

Research and academia

Researchers studying open-source ecosystems, software evolution, or Python adoption can use this Actor to build longitudinal datasets. Track how a topic's package count and download volume changes over time by scheduling periodic runs and appending results.

Creative and hobby projects

Python hobbyists building personal package dashboards, README badges, or portfolio sites can pull live PyPI data for any package they maintain or depend on - author info, keywords, version, publish date - all in one structured record.

Non-profit and open-source community

Open-source maintainers running community initiatives can survey the ecosystem for unmaintained packages (old last-publish dates, no classifiers, zero downloads) and offer to take over or deprecate them. Keeps the ecosystem healthy.

Education and learning

Python instructors building curriculum around real data can use this Actor to pull a fresh list of packages for a topic like "data visualization" or "web frameworks" and use the dataset in labs, quizzes, or project prompts.


🤖 Ask an AI assistant about this scraper

"How do I use the PyPI Scraper to find all machine learning packages with MIT license?" "Can I export PyPI package data to Google Sheets automatically?" "How do I search for packages by keyword and sort by weekly downloads?"

This Actor has a structured output that any AI assistant (ChatGPT, Claude, Gemini) can analyze directly once you paste the JSON. Use it to build package comparison tables, spot trends, or generate dependency documentation.


❓ Frequently Asked Questions

❓ Does this require a PyPI account or API key? No. The PyPI JSON API and PyPI Stats API are both fully public and require no authentication.

❓ How does the search work? The Actor fetches the complete PyPI package index (812,000+ entries), filters by packages whose names contain your search terms, ranks them by relevance (exact phrase match scores highest), then fetches full JSON metadata and download stats for each match.

❓ Why are some download stats null? Very new packages (published in the last 24-48 hours) may not yet appear in pypistats.org's rolling statistics. Once the stats backfill, a re-run will return values.

❓ Why are some fields like license or author null? PyPI does not enforce required metadata fields. Many packages - especially older or personal ones - do not declare a license string, author name, or homepage URL. The Actor tries to extract license from classifiers as a fallback.

❓ What is the totalDownloads field? This is the last-30-days download count from pypistats.org. A true all-time total is not exposed via any public PyPI API - that would require BigQuery. Monthly is the best available approximation.

❓ How fast is it? 5 items: 2-3 seconds. 50 items: 20-30 seconds. 100 items: 45-60 seconds. Speed depends on how many candidates need to be checked to find matching packages.

❓ Can I search by author or license? Not directly - the search matches package names. After downloading your dataset, filter by the author or license fields in Excel, Pandas, or any BI tool.

❓ How often should I run this? For competitive intelligence, weekly. For one-time research, on demand. For security monitoring, daily with a narrow keyword.

❓ Does it work on Windows and Mac? The Actor runs on Apify's cloud infrastructure. Your OS does not matter - run from any browser.

❓ Is the data the same as what I see on pypi.org? Yes. The Actor reads the same PyPI JSON API used by pip install. Metadata is identical to the package page on pypi.org.


🔌 Integrate with any app

IntegrationHow
Google SheetsApify Google Sheets integration or Zapier
AirtableZapier or Make webhook trigger
NotionMake (Integromat) with Notion module
SlackApify Slack integration - post results to a channel
BigQueryApify BigQuery integration or export JSON
PostgreSQLApify dataset export + COPY or SQLAlchemy
Python (pandas)pandas.read_json(dataset_url)
REST APIGET https://api.apify.com/v2/datasets/{id}/items
CSV downloadClick "Export" in the Apify Console - one click
Excel downloadClick "Export as Excel" in the Apify Console

ActorWhat it does
npm Registry ScraperScrape JavaScript package metadata and download stats from the npm registry
Product Hunt ScraperExtract product launches, makers, topics, and upvote counts from Product Hunt
GitHub ScraperCollect repository metadata, stars, forks, and contributor data from GitHub
Upwork ScraperSearch Upwork job postings and freelancer profiles by keyword
Remotive ScraperCollect remote tech job listings across categories and companies

💡 Pro Tip: browse the complete ParseForge collection for 50+ specialized data scrapers covering jobs, finance, aviation, government data, and developer tools.


Disclaimer: This Actor accesses publicly available data from PyPI's official JSON API (pypi.org/pypi/{name}/json) and the pypistats.org recent downloads API. Both APIs are open and documented by the Python Software Foundation. No login, scraping of private pages, or circumvention of access controls is involved. Use responsibly and in accordance with PyPI's terms of service.