Universal Web Extractor V8 avatar
Universal Web Extractor V8

Pricing

from $0.01 / 1,000 results

Go to Apify Store
Universal Web Extractor V8

Universal Web Extractor V8

Beginner-friendly universal web extractor that converts web pages into clean, structured data on the first run. Export instantly to CSV, Excel, or JSON — no coding required.

Pricing

from $0.01 / 1,000 results

Rating

5.0

(1)

Developer

Leoncio Jr Coronado

Leoncio Jr Coronado

Maintained by Community

Actor stats

0

Bookmarked

15

Total users

4

Monthly active users

4 days ago

Last modified

Share

Universal Web Extractor V8

Python Edition — HTTPX + BeautifulSoup

Overview

Universal Web Extractor V8 is a fast, lightweight web scraping Actor that fetches webpages over HTTP, parses HTML using BeautifulSoup, and returns clean, structured content — including page title, meta description, and readable full text — without launching a browser.

This Actor is optimized for speed, low cost, and simplicity, making it ideal for APIs, SEO pipelines, research tools, and content analysis workflows.

⚡ No browser 💸 Low resource usage 📄 Clean, machine-ready output

🚀 When to Use This Actor

Use Universal Web Extractor V8 (HTTP version) when:

Pages are static HTML (no JavaScript rendering required)

You need fast and low-cost scraping

You want clean text content from webpages

You are building:

SEO pipelines

Research or content analysis tools

Metadata extraction APIs

Lightweight data pipelines

👉 For JavaScript-heavy websites, use the Playwright edition of this Actor instead.

🧠 How It Works

The Actor loads start_urls from the input

For each URL, it:

Sends an HTTP request using httpx

Parses HTML with BeautifulSoup

Extracts:

Page title

Meta description

Cleaned full text content

Results are stored in a flat JSON dataset

No browser. No JavaScript rendering. Maximum speed.

📥 Input Example { "start_urls": [ "https://example.com", "https://quotes.toscrape.com/" ] }

📤 Output Example { "url": "https://example.com", "title": "Example Domain", "description": "This domain is for use in illustrative examples.", "text_content": "Example Domain This domain is for use in illustrative examples...", "timestamp": "2025-01-01T12:00:00Z" } Each run always returns structured, predictable output, suitable for downstream automation.

🧪 Best Practices

Recommended for static HTML pages, such as:

Articles and blog posts

Documentation pages

Product descriptions

Landing pages

SEO metadata scraping

💡 Tip: Batch multiple URLs in one run for maximum efficiency.

❗ Limitations

❌ Cannot render JavaScript

❌ Not suitable for SPAs (React, Vue, Angular)

❌ No automatic pagination (HTTP-only version)

❌ No selector-based structured extraction (yet)

These limitations are intentional to keep the Actor fast, simple, and low-cost.

💡 Tips & Integrations

If a website requires JavaScript → use the Playwright version

Combine with downstream Actors or tools for:

Data cleaning

NLP processing

Embeddings

Search indexing

Analytics pipelines

🔧 Changelog v0.0.9 — Python HTTP / BeautifulSoup Edition

Added httpx + BeautifulSoup extraction core

Automatic title, description, and text extraction

clean_html() helper for readable output

Simplified input schema (start_urls only)

Flat output schema (URL, timestamp, and content fields)

Ready for QA, Spotlight, and $1M Challenge evaluation

🏆 Why This Actor Exists

This Actor follows a simple philosophy:

Do one thing extremely well.

Universal Web Extractor V8 focuses on:

Speed

Reliability

Low cost

Clean output

Perfect for teams that need raw, readable webpage content without browser overhead or JavaScript complexity.