Universal Web Extractor V8
Pricing
Pay per event
Universal Web Extractor V8
Flexible web extractor using Python + Playwright or HTTP. Supports CSS-based field extraction, HTML snapshots, screenshots, metadata, monitoring mode, and link-following. Ideal for scraping product pages, listings, news articles, tech profiles, or universal structured data from any website.
Pricing
Pay per event
Rating
0.0
(0)
Developer

Leoncio Jr Coronado
Actor stats
0
Bookmarked
1
Total users
1
Monthly active users
12 hours ago
Last modified
Categories
Share
📘 Universal Web Extractor V8 Hybrid Playwright + BeautifulSoup Web Scraper
Flexible. Powerful. Universal. Extract structured data from any website — static or dynamic — in seconds.
✨ Overview
Universal Web Extractor V8 is a hybrid web scraping engine designed for maximum reliability and flexibility:
Dynamic websites → Uses Playwright (headless browser)
Static websites → Uses BeautifulSoup (super-fast HTML parsing)
This Actor automatically:
Extracts custom fields using CSS selectors
Follows pagination
Supports lists, product pages, article content, job listings, profiles, price tracking, and more
Stores structured data in the dataset
Captures HTML snapshots and screenshots (optional)
🚀 Use Cases 🛒 E-commerce
Titles, prices, images, descriptions
Pagination across multi-page listings
📰 News & Articles
Headlines, authors, publish dates
Article body extraction
🏢 Business Data
Company names, reviews, contact details
Tech stack profiling (via selectors)
📊 Analytics & Automation
Monitoring pages periodically
Creating datasets for machine learning models
Feed data into CRMs, APIs, or workflows
🧠 Features Feature Playwright Mode Soup Mode Handles JavaScript ✅ ❌ Fast & lightweight ❌ ✅ CSS field extraction ✅ ✅ HTML snapshots ✔ Optional ✔ Optional Screenshots ✔ Optional ❌ Pagination support ✅ ✅
🛠 How It Works
You simply provide:
✔ start_urls ✔ fields (e.g., title=h1, price=.product-price) ✔ link_selector (optional pagination) ✔ mode (use_playwright: true|false)
The extractor will:
Fetch each start URL
Extract desired fields
Follow pagination (if enabled)
Save results to the dataset
Save HTML snapshots / screenshots (optional)
🔧 Input Schema { "start_urls": ["https://example.com"], "fields": ["title=h1", "price=.price-tag"], "link_selector": ".next a", "use_playwright": false, "block_resources": true, "max_requests": 30, "max_depth": 3, "save_html_snapshot": true, "save_screenshot": false }
📤 Output Format Each dataset item contains: { "url": "https://example.com/product-1", "title": ["Product Name"], "price": ["$29.99"], "timestamp": "2025-01-01T12:00:00Z" }
🧩 Field Extraction Guide Provide CSS selectors in this format:
title=h1 price=.price description=.product-description p author=.post-author quote=.text
You can extract any HTML element.
🧭 Pagination
Enable automatic pagination by using: "link_selector": ".next a"
Increase depth if you want more pages: "max_depth": 5
🖼 Snapshots & Screenshots
Enable full page snapshots: "save_html_snapshot": true
Enable screenshots (Playwright only): "save_screenshot": true
Snapshots are stored in the Key-value Store.
⚡ Modes Explained Use Playwright when:
JS-heavy website
Infinite scroll
Protected elements
Dynamic rendering
Use BeautifulSoup when:
Fast crawling needed
Static HTML
API-like speed desired
🔐 Advanced Tips
Block images + fonts (faster) "block_resources": true
Limit detection "max_requests": 1 "max_depth": 0
Perfect for testing.
🏁 Example: Quotes to Scrape { "start_urls": ["http://quotes.toscrape.com"], "fields": ["title=h1", "quote=.text"], "link_selector": ".next a", "use_playwright": false, "max_requests": 10, "max_depth": 3 } 🧨 Notes / Limitations
Some sites may block Playwright (rare)
Large HTML snapshots may slow down KV storage
CAPTCHA-handled sites are unsupported
❤️ Created by Leoncio Jr Coronado
Apify Developer • Web Scraping Engineer • Automation Specialist
If you need custom scraping solutions: LinkedIn / Upwork / Fiverr — Available for projects