Headless Browser HTML Scraper
Pricing
from $1.00 / 1,000 results
Headless Browser HTML Scraper
Render any URL in a real headless browser and return the fully-rendered HTML, the page text, or a selected area by CSS selector. Scroll for lazy content, wait for elements, and capture screenshots. A browserless-style HTML API on Apify.
Pricing
from $1.00 / 1,000 results
Rating
0.0
(0)
Developer
Dev Patel
Maintained by CommunityActor stats
1
Bookmarked
2
Total users
1
Monthly active users
a day ago
Last modified
Categories
Share
A generic, browserless-style HTML API. Give it any URL and it opens a real headless Chromium browser, fully renders the JavaScript, optionally scrolls and waits, then returns the full rendered HTML — or just a selected area by CSS selector.
Think of it as a self-hosted browserless.io /content + /scrape on Apify.
What it does
- 🌐 Renders any URL with a real browser (JavaScript executed)
- 🧩 Selected area — pass a CSS selector and get every matching element's HTML, text, attributes, and position
- 📜 Scroll to bottom — trigger infinite-scroll / lazy-loaded content with real wheel events
- ⏳ Wait for a selector, a load event, or a fixed delay
- 🖼️ Optional full-page screenshot
- 🚫 Block images/media/fonts/CSS to speed up and cut bandwidth
- 🔌 Use it synchronously as an API (
run-sync-get-dataset-items)
Input
| Field | Type | Description |
|---|---|---|
urls | array | Required. URLs to render and scrape. |
selector | string | Optional CSS selector for the "selected area". Returns each match's HTML/text/attributes/position. Empty = full page only. |
scrollToBottom | boolean | Scroll down to load lazy content. Default false. |
maxScrolls | integer | Max scroll rounds when scrolling. Default 15. |
waitForSelector | string | Wait until this selector appears (≤30s). |
waitUntil | enum | domcontentloaded (default) · load · networkidle. |
waitMs | integer | Extra fixed wait after load (ms). |
blockResources | array | Resource types to block. Default ["media","font"]. |
returnFullHtml | boolean | Include full rendered HTML. Default true. |
returnText | boolean | Include page visible text. Default true. |
includeScreenshot | boolean | Capture a full-page screenshot and return its URL. Default false. |
proxyConfiguration | object | Apify Proxy (datacenter) by default; use Residential for bot-protected sites. |
Example: full HTML of a JS-rendered page
{ "urls": [{ "url": "https://www.example.com" }], "waitUntil": "networkidle" }
Example: extract a selected area, after scrolling
{"urls": [{ "url": "https://news.ycombinator.com" }],"selector": "span.titleline a","scrollToBottom": true}
Output
One record per URL:
{"url": "https://www.example.com","loadedUrl": "https://www.example.com/","statusCode": 200,"title": "Example Domain","html": "<!DOCTYPE html><html>...</html>","text": "Example Domain\nThis domain is for use in...","selectedCount": 30,"selectedElements": [{"text": "Some headline","html": "<a href=\"...\">Some headline</a>","attributes": [{ "name": "href", "value": "https://..." }],"width": 320, "height": 18, "top": 140, "left": 24}],"screenshotUrl": "https://api.apify.com/v2/key-value-stores/.../records/screenshot-1","scrapedAt": "2026-06-13T08:00:00.000Z"}
Use as an API
curl -X POST "https://api.apify.com/v2/acts/USERNAME~browserless-html-scraper/run-sync-get-dataset-items?token=TOKEN" \-H "Content-Type: application/json" \-d '{"urls":[{"url":"https://www.example.com"}],"selector":"h1"}'
Notes
- For bot-protected sites, switch
proxyConfigurationto Residential. - Blocking
image/stylesheetspeeds things up but can break layout-dependent lazy scrolling on some sites — keep them enabled (don't block) when usingscrollToBottomon such pages.