Headless Browser HTML Scraper avatar

Headless Browser HTML Scraper

Pricing

from $1.00 / 1,000 results

Go to Apify Store
Headless Browser HTML Scraper

Headless Browser HTML Scraper

Render any URL in a real headless browser and return the fully-rendered HTML, the page text, or a selected area by CSS selector. Scroll for lazy content, wait for elements, and capture screenshots. A browserless-style HTML API on Apify.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

Dev Patel

Dev Patel

Maintained by Community

Actor stats

1

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

A generic, browserless-style HTML API. Give it any URL and it opens a real headless Chromium browser, fully renders the JavaScript, optionally scrolls and waits, then returns the full rendered HTML — or just a selected area by CSS selector.

Think of it as a self-hosted browserless.io /content + /scrape on Apify.

What it does

  • 🌐 Renders any URL with a real browser (JavaScript executed)
  • 🧩 Selected area — pass a CSS selector and get every matching element's HTML, text, attributes, and position
  • 📜 Scroll to bottom — trigger infinite-scroll / lazy-loaded content with real wheel events
  • Wait for a selector, a load event, or a fixed delay
  • 🖼️ Optional full-page screenshot
  • 🚫 Block images/media/fonts/CSS to speed up and cut bandwidth
  • 🔌 Use it synchronously as an API (run-sync-get-dataset-items)

Input

FieldTypeDescription
urlsarrayRequired. URLs to render and scrape.
selectorstringOptional CSS selector for the "selected area". Returns each match's HTML/text/attributes/position. Empty = full page only.
scrollToBottombooleanScroll down to load lazy content. Default false.
maxScrollsintegerMax scroll rounds when scrolling. Default 15.
waitForSelectorstringWait until this selector appears (≤30s).
waitUntilenumdomcontentloaded (default) · load · networkidle.
waitMsintegerExtra fixed wait after load (ms).
blockResourcesarrayResource types to block. Default ["media","font"].
returnFullHtmlbooleanInclude full rendered HTML. Default true.
returnTextbooleanInclude page visible text. Default true.
includeScreenshotbooleanCapture a full-page screenshot and return its URL. Default false.
proxyConfigurationobjectApify Proxy (datacenter) by default; use Residential for bot-protected sites.

Example: full HTML of a JS-rendered page

{ "urls": [{ "url": "https://www.example.com" }], "waitUntil": "networkidle" }

Example: extract a selected area, after scrolling

{
"urls": [{ "url": "https://news.ycombinator.com" }],
"selector": "span.titleline a",
"scrollToBottom": true
}

Output

One record per URL:

{
"url": "https://www.example.com",
"loadedUrl": "https://www.example.com/",
"statusCode": 200,
"title": "Example Domain",
"html": "<!DOCTYPE html><html>...</html>",
"text": "Example Domain\nThis domain is for use in...",
"selectedCount": 30,
"selectedElements": [
{
"text": "Some headline",
"html": "<a href=\"...\">Some headline</a>",
"attributes": [{ "name": "href", "value": "https://..." }],
"width": 320, "height": 18, "top": 140, "left": 24
}
],
"screenshotUrl": "https://api.apify.com/v2/key-value-stores/.../records/screenshot-1",
"scrapedAt": "2026-06-13T08:00:00.000Z"
}

Use as an API

curl -X POST "https://api.apify.com/v2/acts/USERNAME~browserless-html-scraper/run-sync-get-dataset-items?token=TOKEN" \
-H "Content-Type: application/json" \
-d '{"urls":[{"url":"https://www.example.com"}],"selector":"h1"}'

Notes

  • For bot-protected sites, switch proxyConfiguration to Residential.
  • Blocking image/stylesheet speeds things up but can break layout-dependent lazy scrolling on some sites — keep them enabled (don't block) when using scrollToBottom on such pages.