Universal Web Extractor V8
Pricing
from $0.01 / 1,000 results
Universal Web Extractor V8
Flexible web extractor using Python + Playwright or HTTP. Supports CSS-based field extraction, HTML snapshots, screenshots, metadata, monitoring mode, and link-following. Ideal for scraping product pages, listings, news articles, tech profiles, or universal structured data from any website.
Pricing
from $0.01 / 1,000 results
Rating
5.0
(1)
Developer

Leoncio Jr Coronado
Actor stats
0
Bookmarked
10
Total users
3
Monthly active users
5 days ago
Last modified
Categories
Share
🟦 Universal Web Extractor V8
Python Edition — HTTPX + BeautifulSoup
A fast, lightweight universal web scraper that fetches webpages over HTTP, parses HTML using BeautifulSoup, and returns clean, structured data — including title, description, and full text — without launching a browser.
This Actor is designed for speed, low cost, and simplicity, making it ideal for APIs, SEO pipelines, metadata extraction, and content analysis.
🚀 When to Use This Actor
Use Universal Web Extractor V8 (HTTP version) when:
Pages are static HTML (no JavaScript rendering required)
You need fast, low-cost scraping
You want clean text content from webpages
You are building SEO, research, or content pipelines
For JavaScript-heavy websites, use the Playwright edition of this Actor instead.
🚀 When to Use This Actor
Use Universal Web Extractor V8 (HTTP version) when:
Pages are static HTML (no JavaScript rendering required)
You need fast, low-cost scraping
You want clean text content from webpages
You are building SEO, research, or content pipelines
For JavaScript-heavy websites, use the Playwright edition of this Actor instead.
🧠 How It Works
Actor loads start_urls from input
For each URL:
Sends an HTTP request using httpx
Parses HTML with BeautifulSoup
Extracts:
Title
Description
Cleaned full text
Pushes results to a flat JSON dataset
No browser. No JavaScript rendering. Maximum speed.
📥 Input Example { "start_urls": [ "https://example.com", "https://quotes.toscrape.com/" ] }
📤 Output Example { "url": "https://example.com", "title": "Example Domain", "description": "This domain is for use in illustrative examples.", "text_content": "Example Domain This domain is for use in illustrative examples...", "timestamp": "2025-01-01T12:00:00Z" }
🧪 Best Practices
Use for static HTML pages
Ideal for:
Articles
Blogs
Documentation
Product descriptions
SEO metadata scraping
Batch URLs for maximum efficiency
❗ Limitations
❌ Cannot render JavaScript
❌ Not suitable for SPAs (React, Vue, Angular)
❌ No auto-pagination (HTTP-only version)
❌ No selector-based structured extraction (yet)
💡 Tips
If a site requires JavaScript → use the Playwright version
Combine with downstream Actors for:
Data cleaning
NLP
Embeddings
Indexing
🔧 Changelog
v0.0.9 — Python HTTP / BeautifulSoup Edition
Added httpx + BeautifulSoup extraction core
Automatic title, description, and text extraction
clean_html() helper for readable output
Simplified input schema (start_urls only)
Flat output schema (URL + timestamp + fields)
Ready for QA, Spotlight, and Challenge evaluation
🏆 Why This Actor Exists
This Actor focuses on speed, reliability, and simplicity — doing one thing extremely well: extract clean content from webpages with minimal cost and maximum performance.

