Universal Web Extractor V8 avatar
Universal Web Extractor V8

Pricing

from $0.01 / 1,000 results

Go to Apify Store
Universal Web Extractor V8

Universal Web Extractor V8

Flexible web extractor using Python + Playwright or HTTP. Supports CSS-based field extraction, HTML snapshots, screenshots, metadata, monitoring mode, and link-following. Ideal for scraping product pages, listings, news articles, tech profiles, or universal structured data from any website.

Pricing

from $0.01 / 1,000 results

Rating

5.0

(1)

Developer

Leoncio Jr Coronado

Leoncio Jr Coronado

Maintained by Community

Actor stats

0

Bookmarked

10

Total users

3

Monthly active users

4 days ago

Last modified

Share

🟦 Universal Web Extractor V8

A powerful, flexible web extractor that crawls any website, follows links, and extracts data using CSS selectors. Supports both HTTP/cheerio (super fast) and Playwright (JavaScript-rendered websites).

⭐ Key Features

Crawl any website – single page or multi-page

Follow internal links or pagination using CSS selectors

Extract unlimited fields using simple name=selector format

Fast HTTP mode (no browser)

Full Playwright mode for JS-heavy websites

HTML snapshots & screenshots (optional)

Automatic proxy support for reliable IP rotation

Flat, clean JSON output compatible with any pipeline

🚀 When to Use This Actor

Use Universal Web Extractor V8 when you want to quickly extract structured data without writing any code.

Common use cases:

Product pages

News articles

Blog posts

Directories

Quotes / texts / titles

Contact information

Documentation and knowledge bases

Pagination-based listings

This Actor works for almost any HTML structure.

🧠 How It Works

You provide one or multiple start URLs

The Actor loads the page

It extracts all fields defined in Fields to Extract

If a link selector is provided, it follows matching links

It continues until:

max_requests is reached

or max_depth limits it

Saves clean, flat JSON output for each page crawled

💻 Example Input (Prefilled)

{ "start_urls": [ "https://quotes.toscrape.com/" ], "link_selector": "a[href*='page']", "use_playwright": true, "max_requests": 5, "max_depth": 1, "fields": [ "title=h1", "quote=.text" ], "save_html_snapshot": false, "save_screenshot": false, "proxy": { "useApifyProxy": true, "apifyProxyCountry": "" } }

🏷️ Field Extraction Format

Use simple name=selector mappings:

Example Meaning title=h1 Extract

text into title quote=.text Extract .text class content price=.price-tag Extract product price desc=.article-body p Extract first

inside .article-body

You can add any number of fields.

🔗 Link Following

Set: a[href*='page']

And the Actor auto-navigates:

Pagination (page 1 → 2 → 3…)

Category lists

Internal link structures

With limits:

max_depth

max_requests

📸 Optional Snapshots

Enable:

save_html_snapshot

save_screenshot

Useful for debugging or verifying extracted pages

📸 Optional Snapshots

Enable:

save_html_snapshot

save_screenshot

Useful for debugging or verifying extracted pages

⚙ Settings Overview Setting Description start_urls Where crawling begins link_selector CSS selector for links to follow use_playwright Enable JS rendering max_requests Hard limit on number of visited pages max_depth How many link levels to follow fields Extracted fields using name=selector pairs save_html_snapshot Save raw HTML save_screenshot Take screenshots proxy Proxy configuration

🧪 Best Practices

Use specific selectors for stable extraction

Keep max_requests reasonable

Use Playwright only when needed

Always test with a single URL first

Add only the fields you want to extract

Enable snapshots if the site layout is unpredictable

❗ Limitations

Very complex interactions (logins, infinite scroll) may require custom scripts

Rate-limited websites may need slower crawling

Multi-step forms are not supported

🚩 Changelog v0.0.8

Added new strict Apify 2025-compatible input schema

Added flat, clean output schema

Fixed proxy editor rules

Added better prefilled input

Improved descriptions for all fields

Ready for spotlight + QA review