PyPI Scraper - Python Package Search & Stats
Pricing
from $19.00 / 1,000 results
PyPI Scraper - Python Package Search & Stats
Search and scrape Python package data from PyPI including versions, authors, licenses, keywords, download stats, and classifiers. Export to CSV, Excel, JSON, XML.
Pricing
from $19.00 / 1,000 results
Rating
0.0
(0)
Developer
ParseForge
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share

🐍 PyPI Python Package Scraper
🚀 Export Python package data from PyPI in seconds. Search 812,000+ packages by keyword and get version, author, license, keywords, download stats, classifiers, and more. No API key, no registration required.
🕒 Last updated: 2026-05-21 · 📊 15 fields per record · 🐍 812,000+ packages · 🌍 Open Python Package Index
The PyPI Scraper searches the Python Package Index and returns 15 fields per record including name, version, summary, author, license, homepage, repository URL, keywords, weekly and monthly download counts, Python version requirements, classifiers, and publish date. The underlying data comes directly from PyPI's public JSON API and is the same catalog used by pip install.
The index covers every publicly released Python package - from the most downloaded frameworks like Django, Flask, and NumPy down to niche utilities and personal projects. This Actor searches by keyword, resolves full metadata for each match, and enriches with real download statistics. Your dataset is ready to download as CSV, Excel, JSON, or XML in under a minute.
| 🎯 Target Audience | 💡 Primary Use Cases |
|---|---|
| Data engineers, Python developers, package analysts, security researchers, DevOps teams, BI analysts | Dependency auditing, license compliance, tech-stack research, competitive analysis, package discovery |
📋 What the PyPI Scraper does
Five research workflows in a single run:
- 🔍 Keyword search. Find packages matching any search term across package names - "machine learning", "web scraping", "django", "data pipeline", "cli".
- 📦 Full metadata extraction. Name, version, summary, author, license, homepage, and repository URL from the PyPI JSON API.
- 📊 Download statistics. Weekly and monthly download counts from the PyPI Stats API - instantly spot popular vs. niche packages.
- 🏷️ Classifier taxonomy. Full PyPI classifier list per package - development status, programming language versions, OS compatibility, topics.
- 📅 Freshness signals. Last publish date and Python version requirement for every record.
💡 Why it matters: The Python ecosystem has over 812,000 packages on PyPI. Manually auditing which libraries match a topic, what their license is, and how actively they are maintained is hours of work. This Actor delivers a structured dataset in under a minute - ready to import into Notion, Airtable, BigQuery, or your CI pipeline.
🎬 Full Demo
🚧 Coming soon: a 2-minute walkthrough showing how to go from sign-up to a downloaded dataset.
⚙️ Input
| Input | Type | Default | Behavior |
|---|---|---|---|
search | string | "web scraping" | Keyword to search across PyPI package names. Supports multi-word queries. |
maxItems | integer | 10 | Records to return. Free plan caps at 10, paid plan at 1,000,000. |
Example: find web scraping libraries.
{"search": "web scraping","maxItems": 50}
Example: find machine learning tools.
{"search": "machine learning","maxItems": 100}
⚠️ Good to Know: search matches package names. Packages with the exact query phrase in their name rank highest, followed by packages containing any word from the query. Download statistics for very new packages (published in the last few days) may return null while pypistats.org catches up. All other fields come directly from the PyPI JSON API and are always current.
📊 Output
Each package record contains 15 fields. Download the dataset as CSV, Excel, JSON, or XML.
🧾 Schema
| Field | Type | Example |
|---|---|---|
📦 name | string | "scrapy" |
🔢 version | string | "2.12.0" |
📝 summary | string | null | "A high-level web crawling and scraping framework..." |
👤 author | string | null | "Scrapy developers" |
⚖️ license | string | null | "BSD" |
🔗 homepage | string | null | "https://scrapy.org/" |
🗂️ repository | string | null | "https://github.com/scrapy/scrapy" |
🏷️ keywords | array | null | ["scraping", "crawling", "spider"] |
📊 weeklyDownloads | integer | null | 892341 |
📈 totalDownloads | integer | null | 3847291 |
🐍 requiresPython | string | null | ">=3.9" |
🗂️ classifiers | array | null | ["Development Status :: 5 - Production/Stable", "..."] |
📅 lastPublished | ISO 8601 | null | "2025-10-16T15:01:47" |
🔗 url | string | "https://pypi.org/project/scrapy/" |
🕒 scrapedAt | ISO 8601 | "2026-05-21T23:29:13.858Z" |
📦 Sample records
✨ Why choose this Actor
| Feature | Benefit |
|---|---|
| 🌐 No API key needed | Works immediately - no registration, no quotas, no auth setup |
| 📦 812,000+ packages | Searches the full PyPI catalog - every public Python library ever released |
| 📊 Real download stats | Weekly and monthly counts from pypistats.org, not cached estimates |
| ⚖️ License field | Extracted from both the license field and classifiers as fallback |
| 🏷️ Full classifier taxonomy | Development status, Python versions, OS, topic - all in one array |
| 🔗 Direct PyPI URLs | Each record links directly to the package page for instant verification |
| 🚀 Fast | 5 records in under 3 seconds, 100 records in under 60 seconds |
| 🔄 Always live | No caches - hits the PyPI JSON API on every run |
📈 How it compares to alternatives
| Method | Packages | Download Stats | License | Classifiers | Speed |
|---|---|---|---|---|---|
| PyPI Scraper (this) | 812,000+ | Yes | Yes | Yes | Seconds |
| Manual PyPI browsing | Manual only | Visible | Visible | Visible | Hours |
pip search | Disabled by PyPI | No | No | No | N/A |
| PyPI XML-RPC search | Disabled by PyPI | No | No | No | N/A |
| Libraries.io API | Yes (limited free) | Yes | Yes | Partial | Requires key |
🚀 How to use
- Create a free account - includes $5 free credit.
- Open PyPI Python Package Scraper in the Apify Store.
- Enter your search query (e.g.
"machine learning","django","data pipeline"). - Set Max Items (free plan preview: 10, paid: up to 1,000,000).
- Click Start and wait a few seconds.
- Download your dataset as CSV, Excel, JSON, or XML.
💼 Business use cases
1. Dependency auditing and license compliance
Enterprises using Python in production need to track every open-source library's license. Run this Actor with relevant library names to build a license inventory - MIT, Apache 2.0, GPL, BSD - and flag packages with restrictive licenses before they enter your codebase.
2. Competitive intelligence for developer tools
If you build a Python library, tool, or SaaS product targeting Python developers, track competing packages: their version cadence, download trajectories, classifiers, and repository links. Spot emerging rivals before they go viral.
3. Security and supply-chain research
Security teams monitor PyPI for typosquatting and malicious packages. Use this Actor to scan packages matching a keyword pattern, compare author names, and flag anomalous entries that shadow popular libraries with nearly identical names.
4. Package discovery for data science teams
Data science leads often onboard new team members who need a curated list of relevant packages for a domain - "NLP", "time series", "geospatial". Run the Actor with a domain keyword and share the CSV as a starting library inventory.
🔌 Automating PyPI Scraper
Connect this Actor to your existing stack using Apify's built-in integrations:
- Make (Integromat) - trigger a run on a schedule, send results to Google Sheets or Notion
- Zapier - auto-export new package data to Airtable, Slack, or email
- Slack - post new package results directly to a channel
- Google Sheets - append rows with every new run via the Apify Google Sheets integration
- Webhooks - push dataset items to any REST endpoint on run completion
- Apify API - call from Python, Node.js, or any HTTP client with your API token
🌟 Beyond business use cases
Research and academia
Researchers studying open-source ecosystems, software evolution, or Python adoption can use this Actor to build longitudinal datasets. Track how a topic's package count and download volume changes over time by scheduling periodic runs and appending results.
Creative and hobby projects
Python hobbyists building personal package dashboards, README badges, or portfolio sites can pull live PyPI data for any package they maintain or depend on - author info, keywords, version, publish date - all in one structured record.
Non-profit and open-source community
Open-source maintainers running community initiatives can survey the ecosystem for unmaintained packages (old last-publish dates, no classifiers, zero downloads) and offer to take over or deprecate them. Keeps the ecosystem healthy.
Education and learning
Python instructors building curriculum around real data can use this Actor to pull a fresh list of packages for a topic like "data visualization" or "web frameworks" and use the dataset in labs, quizzes, or project prompts.
🤖 Ask an AI assistant about this scraper
"How do I use the PyPI Scraper to find all machine learning packages with MIT license?" "Can I export PyPI package data to Google Sheets automatically?" "How do I search for packages by keyword and sort by weekly downloads?"
This Actor has a structured output that any AI assistant (ChatGPT, Claude, Gemini) can analyze directly once you paste the JSON. Use it to build package comparison tables, spot trends, or generate dependency documentation.
❓ Frequently Asked Questions
❓ Does this require a PyPI account or API key? No. The PyPI JSON API and PyPI Stats API are both fully public and require no authentication.
❓ How does the search work? The Actor fetches the complete PyPI package index (812,000+ entries), filters by packages whose names contain your search terms, ranks them by relevance (exact phrase match scores highest), then fetches full JSON metadata and download stats for each match.
❓ Why are some download stats null? Very new packages (published in the last 24-48 hours) may not yet appear in pypistats.org's rolling statistics. Once the stats backfill, a re-run will return values.
❓ Why are some fields like license or author null?
PyPI does not enforce required metadata fields. Many packages - especially older or personal ones - do not declare a license string, author name, or homepage URL. The Actor tries to extract license from classifiers as a fallback.
❓ What is the totalDownloads field?
This is the last-30-days download count from pypistats.org. A true all-time total is not exposed via any public PyPI API - that would require BigQuery. Monthly is the best available approximation.
❓ How fast is it? 5 items: 2-3 seconds. 50 items: 20-30 seconds. 100 items: 45-60 seconds. Speed depends on how many candidates need to be checked to find matching packages.
❓ Can I search by author or license?
Not directly - the search matches package names. After downloading your dataset, filter by the author or license fields in Excel, Pandas, or any BI tool.
❓ How often should I run this? For competitive intelligence, weekly. For one-time research, on demand. For security monitoring, daily with a narrow keyword.
❓ Does it work on Windows and Mac? The Actor runs on Apify's cloud infrastructure. Your OS does not matter - run from any browser.
❓ Is the data the same as what I see on pypi.org?
Yes. The Actor reads the same PyPI JSON API used by pip install. Metadata is identical to the package page on pypi.org.
🔌 Integrate with any app
| Integration | How |
|---|---|
| Google Sheets | Apify Google Sheets integration or Zapier |
| Airtable | Zapier or Make webhook trigger |
| Notion | Make (Integromat) with Notion module |
| Slack | Apify Slack integration - post results to a channel |
| BigQuery | Apify BigQuery integration or export JSON |
| PostgreSQL | Apify dataset export + COPY or SQLAlchemy |
| Python (pandas) | pandas.read_json(dataset_url) |
| REST API | GET https://api.apify.com/v2/datasets/{id}/items |
| CSV download | Click "Export" in the Apify Console - one click |
| Excel download | Click "Export as Excel" in the Apify Console |
🔗 Recommended Actors
| Actor | What it does |
|---|---|
| npm Registry Scraper | Scrape JavaScript package metadata and download stats from the npm registry |
| Product Hunt Scraper | Extract product launches, makers, topics, and upvote counts from Product Hunt |
| GitHub Scraper | Collect repository metadata, stars, forks, and contributor data from GitHub |
| Upwork Scraper | Search Upwork job postings and freelancer profiles by keyword |
| Remotive Scraper | Collect remote tech job listings across categories and companies |
💡 Pro Tip: browse the complete ParseForge collection for 50+ specialized data scrapers covering jobs, finance, aviation, government data, and developer tools.
Disclaimer: This Actor accesses publicly available data from PyPI's official JSON API (pypi.org/pypi/{name}/json) and the pypistats.org recent downloads API. Both APIs are open and documented by the Python Software Foundation. No login, scraping of private pages, or circumvention of access controls is involved. Use responsibly and in accordance with PyPI's terms of service.