Python Web Scraping Toolkit
Pricing
Pay per usage
Go to Apify Store
Python Web Scraping Toolkit
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Josh Baker
Maintained by Community
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
PyScrappy: robust, all-in-one Python web scraping toolkit
PyScrappy is a Python toolkit for web scraping that works out of the box. Point it at any URL and get structured data back — or use built-in scrapers for Wikipedia, IMDB, Yahoo Finance, news feeds, and more.
Key features
- Generic scraper — give it any URL, get back structured text, links, images, tables, and metadata
- Auto-pagination — automatically follows "next page" links
- JS rendering — optional Playwright backend for JavaScript-heavy sites
- Custom selectors — pass CSS selectors to extract exactly what you need
- Built-in scrapers — Wikipedia, IMDB, Yahoo Finance, news (RSS), image search, Amazon, LinkedIn
- Clean API — every scraper returns a
ScrapeResultwith.to_dataframe()and.to_json() - Retry & rate-limiting — built-in exponential backoff and per-domain rate limiting
- Type-safe — full type hints,
py.typedmarker
Installation
pip install pyscrappy
Optional extras:
# Browser support (for JS-rendered pages)pip install 'pyscrappy[browser]'playwright install chromium# DataFrame supportpip install 'pyscrappy[dataframe]'# Everythingpip install 'pyscrappy[all]'
Quick start
Scrape any URL (one-liner)
from pyscrappy import scraperesult = scrape("https://en.wikipedia.org/wiki/Web_scraping")print(result.data[0]["metadata"]["title"])print(result.data[0]["text"]["word_count"])
Custom CSS selectors
from pyscrappy import GenericScraperwith GenericScraper() as gs:result = gs.scrape(url="https://news.ycombinator.com",selectors={"title": ".titleline a", "score": ".score"},)for item in result.data:print(item["title"], item.get("score", ""))
Wikipedia
from pyscrappy import WikipediaScraperwith WikipediaScraper() as ws:result = ws.scrape(query="Python (programming language)", mode="summary")print(result.data[0]["text"])
Stock data
from pyscrappy import StockScraperwith StockScraper() as ss:result = ss.scrape(symbol="AAPL", mode="history", period="1mo")df = result.to_dataframe()print(df.head())
IMDB
from pyscrappy import IMDBScraperwith IMDBScraper() as scraper:result = scraper.scrape(genre="sci-fi", max_pages=2)df = result.to_dataframe()print(df[["title", "year", "rating"]])
News (RSS feeds)
from pyscrappy import NewsScraperwith NewsScraper() as ns:result = ns.scrape(feed_url="https://rss.nytimes.com/services/xml/rss/nyt/World.xml")for article in result.data[:5]:print(article["title"])
Image search
from pyscrappy import ImageSearchScraperwith ImageSearchScraper() as iss:result = iss.scrape(query="golden retriever", max_images=10, download_to="./dogs")
Configuration
from pyscrappy import ScraperConfig, GenericScraperconfig = ScraperConfig(timeout=20.0, # request timeout in secondsmax_retries=3, # retry failed requestsrate_limit=2.0, # seconds between requests per domainproxy="http://...", # HTTP/SOCKS proxyheadless=True, # browser runs headlessrender_js="auto", # auto-detect if JS rendering is needed)with GenericScraper(config) as gs:result = gs.scrape(url="https://example.com")
YouTube
from pyscrappy import YouTubeScraperwith YouTubeScraper() as scraper:result = scraper.scrape(query="python tutorial", max_results=10)for video in result.data:print(video["title"], video.get("views", ""))
SoundCloud
from pyscrappy import SoundCloudScraperwith SoundCloudScraper() as scraper:result = scraper.scrape(query="lo-fi beats", max_results=10)
E-Commerce (Alibaba, Flipkart, Snapdeal)
from pyscrappy import AlibabaScraper, FlipkartScraper, SnapdealScraperwith FlipkartScraper() as scraper:result = scraper.scrape(query="laptop", max_pages=2)df = result.to_dataframe()
Food Delivery (Swiggy, Zomato)
from pyscrappy import SwiggyScraper, ZomatoScraper# These are JS-heavy — use render_js=True for best resultswith SwiggyScraper() as scraper:result = scraper.scrape(city="bangalore", render_js=True)
Built-in scrapers
| Scraper | What it does | Needs browser? |
|---|---|---|
GenericScraper | Scrape any URL with auto-extraction | Optional |
| Data / Research | ||
WikipediaScraper | Articles, sections, infoboxes | No |
IMDBScraper | Movies by genre, search, charts | No |
StockScraper | Quotes, history, profiles (Yahoo Finance) | No |
NewsScraper | RSS/Atom feeds, article extraction | No |
ImageSearchScraper | Image search + download | No |
LinkedInJobsScraper | Public job listings | No |
| E-Commerce | ||
AmazonScraper | Product search | No |
AlibabaScraper | Product search | No |
FlipkartScraper | Product search | No |
SnapdealScraper | Product search | No |
| Social Media | ||
YouTubeScraper | Video search, channel scraping | Optional |
InstagramScraper | Profiles, hashtag posts | Recommended |
TwitterScraper | Tweet search | Recommended |
| Music | ||
SpotifyScraper | Track/playlist search | Recommended |
SoundCloudScraper | Track search | Optional |
| Food Delivery | ||
SwiggyScraper | Restaurant listings | Recommended |
ZomatoScraper | Restaurant listings | Recommended |
Dependencies
Required: httpx, beautifulsoup4, lxml
Optional: playwright (JS rendering), pandas (DataFrames)
License
Contributing
All contributions welcome. See Issues.
This package is for educational and research purposes.