Universal Web Extractor V8
Pricing
from $0.01 / 1,000 results
Universal Web Extractor V8
Flexible web extractor using Python + Playwright or HTTP. Supports CSS-based field extraction, HTML snapshots, screenshots, metadata, monitoring mode, and link-following. Ideal for scraping product pages, listings, news articles, tech profiles, or universal structured data from any website.
Pricing
from $0.01 / 1,000 results
Rating
5.0
(1)
Developer

Leoncio Jr Coronado
Actor stats
0
Bookmarked
10
Total users
3
Monthly active users
4 days ago
Last modified
Categories
Share
🟦 Universal Web Extractor V8
A powerful, flexible web extractor that crawls any website, follows links, and extracts data using CSS selectors. Supports both HTTP/cheerio (super fast) and Playwright (JavaScript-rendered websites).
⭐ Key Features
Crawl any website – single page or multi-page
Follow internal links or pagination using CSS selectors
Extract unlimited fields using simple name=selector format
Fast HTTP mode (no browser)
Full Playwright mode for JS-heavy websites
HTML snapshots & screenshots (optional)
Automatic proxy support for reliable IP rotation
Flat, clean JSON output compatible with any pipeline
🚀 When to Use This Actor
Use Universal Web Extractor V8 when you want to quickly extract structured data without writing any code.
Common use cases:
Product pages
News articles
Blog posts
Directories
Quotes / texts / titles
Contact information
Documentation and knowledge bases
Pagination-based listings
This Actor works for almost any HTML structure.
🧠 How It Works
You provide one or multiple start URLs
The Actor loads the page
It extracts all fields defined in Fields to Extract
If a link selector is provided, it follows matching links
It continues until:
max_requests is reached
or max_depth limits it
Saves clean, flat JSON output for each page crawled
💻 Example Input (Prefilled)
{ "start_urls": [ "https://quotes.toscrape.com/" ], "link_selector": "a[href*='page']", "use_playwright": true, "max_requests": 5, "max_depth": 1, "fields": [ "title=h1", "quote=.text" ], "save_html_snapshot": false, "save_screenshot": false, "proxy": { "useApifyProxy": true, "apifyProxyCountry": "" } }
🏷️ Field Extraction Format
Use simple name=selector mappings:
Example Meaning title=h1 Extract
text into title quote=.text Extract .text class content price=.price-tag Extract product price desc=.article-body p Extract first
inside .article-body
You can add any number of fields.
🔗 Link Following
Set: a[href*='page']
And the Actor auto-navigates:
Pagination (page 1 → 2 → 3…)
Category lists
Internal link structures
With limits:
max_depth
max_requests
📸 Optional Snapshots
Enable:
save_html_snapshot
save_screenshot
Useful for debugging or verifying extracted pages
📸 Optional Snapshots
Enable:
save_html_snapshot
save_screenshot
Useful for debugging or verifying extracted pages
⚙ Settings Overview Setting Description start_urls Where crawling begins link_selector CSS selector for links to follow use_playwright Enable JS rendering max_requests Hard limit on number of visited pages max_depth How many link levels to follow fields Extracted fields using name=selector pairs save_html_snapshot Save raw HTML save_screenshot Take screenshots proxy Proxy configuration
🧪 Best Practices
Use specific selectors for stable extraction
Keep max_requests reasonable
Use Playwright only when needed
Always test with a single URL first
Add only the fields you want to extract
Enable snapshots if the site layout is unpredictable
❗ Limitations
Very complex interactions (logins, infinite scroll) may require custom scripts
Rate-limited websites may need slower crawling
Multi-step forms are not supported
🚩 Changelog v0.0.8
Added new strict Apify 2025-compatible input schema
Added flat, clean output schema
Fixed proxy editor rules
Added better prefilled input
Improved descriptions for all fields
Ready for spotlight + QA review
