Pricing

from $1.50 / 1,000 results

Stealth Website Scraper | 💰$1.5 per 1,000 results

Extract text, links, metadata, HTML, markdown, and structured page data with HTTP-first crawling and stealth-aware browser fallback.

Pricing

from $1.50 / 1,000 results

Rating

0.0

(0)

Developer

Solutions Smart

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

Stealth Website Scraper

Stealth Website Scraper extracts text, links, metadata, HTML, markdown, and structured page data from websites using a fast HTTP-first crawl with browser fallback when plain requests are not enough.

It is designed for production scraping and analysis workflows where cost, speed, and reliability all matter. The actor starts with lightweight HTTP crawling through CheerioCrawler, then falls back to a browser flow when the site blocks requests, returns thin content, or depends on JavaScript rendering.

What does Stealth Website Scraper do?

Stealth Website Scraper enables you to extract data from websites with intelligent fallback handling. Whether you are dealing with server-rendered content or JavaScript-heavy pages, this actor adapts its approach to maximize success while minimizing costs.

Stealth Website Scraper can extract:

Clean text content and markdown from web pages
Page metadata such as title, description, and canonical URL
Headings such as H1 and H2
Internal and external links
HTML source code
HTTP status codes and content type information
Crawl depth and crawl method information

Why scrape websites?

Websites contain publicly available data that can support AI pipelines, market research, competitive analysis, and business intelligence.

AI and RAG pipelines: Feed clean text content into machine learning models and retrieval-augmented generation systems.
Business intelligence: Extract metadata, pricing, and product information from competitor websites.
Content extraction: Build datasets for training, analysis, and enrichment workflows.
Testing and QA: Verify site rendering across different network conditions and browser types.
Market research: Gather structured data from public websites at scale.
Fingerprint testing: Compare stealth browser behavior against standard browser automation.

How to scrape websites with Stealth Website Scraper

Click Try for free to open the actor.
Enter one or more Start URLs.
Configure optional settings:
- Max Pages: Limit the number of pages to crawl.
- Max Depth: Control how deep internal link crawling goes.
- Crawling Mode: Choose between http-first and browser-only.
- Stealth Browser: Select cloak to attempt CloakBrowser, or playwright for standard Playwright.
- Extraction Mode: Choose what data to extract such as all, text, markdown, html, or links.
Click Run.
When the run completes, preview or download your data from the Dataset tab.

How much will it cost to scrape websites?

Apify gives you $5 in free usage credits every month on the Apify Free plan. Since HTTP scraping is much cheaper than browser automation, you can extract many pages for low cost by relying on the HTTP-first strategy.

The Stealth Website Scraper uses HTTP requests first whenever possible. This means:

Lower compute costs
Faster execution
Higher throughput
Less browser overhead

Browser fallback only engages when necessary, keeping costs down while maintaining reliability.

For regular large-scale scraping, review current Apify pricing and set maxPages, maxDepth, and concurrency to match your budget.

Results

Each processed page produces one clean dataset item.

Example output

{
  "url": "https://example.com",
  "loadedUrl": "https://example.com/",
  "domain": "example.com",
  "title": "Example Domain",
  "metaDescription": "A reserved-use domain in DNS",
  "canonicalUrl": "https://example.com/",
  "h1": ["Example Domain"],
  "h2": ["More information"],
  "text": "Example Domain This domain is for use in illustrative examples in documents...",
  "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples in documents...",
  "html": "<!doctype html>\n<html>\n<head>...</head>...",
  "links": ["https://example.com/more-info", "https://example.com/about"],
  "externalLinks": ["https://www.iana.org/"],
  "statusCode": 200,
  "contentType": "text/html; charset=UTF-8",
  "depth": 0,
  "crawlMethod": "http",
  "fallbackUsed": false,
  "fallbackReason": "",
  "timestamp": "2026-05-17T21:00:00.000Z"
}

The actor also stores a final summary in the key-value store under OUTPUT:

{
  "pagesScraped": 25,
  "httpPages": 22,
  "browserPages": 3,
  "cloakPages": 0,
  "failedPages": 0,
  "fallbacks": 3,
  "uniqueUrlsQueued": 27,
  "startedAt": "2026-05-17T21:00:00.000Z",
  "finishedAt": "2026-05-17T21:05:30.000Z",
  "durationSeconds": 330
}

HTTP-first vs. browser mode

HTTP-first mode

The actor starts with lightweight HTTP requests using CheerioCrawler. This is the fastest and cheapest approach.

Browser fallback triggers when:

The site returns 403, 429, or 503
The HTTP response body is empty or below minTextLengthForSuccess
The page appears JavaScript-heavy
Text extraction returns minimal content

Browser-only mode

Skip HTTP entirely and crawl exclusively with a browser. Useful for:

JavaScript-heavy single-page applications
Sites with stronger bot protection
Pages requiring browser rendering

Stealth browser options

cloak: Attempts CloakBrowser, a fingerprint-aware Chromium fork with source-level stealth patches. Falls back to standard Playwright if unavailable.
playwright: Uses standard Playwright Chromium.

Input parameters

Essential parameters

startUrls: Array of URLs or objects with a url property. Required.
maxPages: Maximum pages to scrape. Default: 100.
maxDepth: Maximum link depth for crawling. Default: 2.
crawlingMode: http-first or browser-only. Default: http-first.

Crawling options

scrapeInternalLinks: Enable internal link crawling. Default: true.
sameDomainOnly: Limit crawling to the start domain. Default: true.
maxConcurrency: Concurrent request limit. Default: 5.
requestTimeoutSecs: Request timeout in seconds. Default: 30.

Extraction options

extractionMode: all, text, markdown, html, or links. Default: all.
includeHtml: Include full HTML source. Default: false.
includeLinks: Extract internal links. Default: true.
includeExternalLinks: Extract external links. Default: false.

Browser options

stealthBrowser: cloak or playwright. Default: cloak.
waitUntil: domcontentloaded, load, or networkidle. Default: domcontentloaded.
blockResources: Block images, fonts, media, and stylesheets to speed up rendering. Default: true.
fallbackOnStatusCodes: Status codes that trigger browser fallback. Default: [403, 429, 503].
minTextLengthForSuccess: Minimum text length to avoid fallback. Default: 300.

Proxy and headers

proxyConfiguration: Proxy setup, including Apify Proxy.
customUserAgent: Custom User-Agent header.

Example input

{
  "startUrls": [
    { "url": "https://example.com" },
    "https://example.org/docs"
  ],
  "maxPages": 50,
  "maxDepth": 2,
  "sameDomainOnly": true,
  "scrapeInternalLinks": true,
  "extractionMode": "all",
  "crawlingMode": "http-first",
  "stealthBrowser": "cloak",
  "fallbackOnStatusCodes": [403, 429, 503],
  "minTextLengthForSuccess": 300,
  "waitUntil": "domcontentloaded",
  "blockResources": true,
  "includeHtml": false,
  "includeLinks": true,
  "includeExternalLinks": false,
  "maxConcurrency": 5,
  "requestTimeoutSecs": 30,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}

Tips for scraping websites

Start with HTTP mode: Most websites serve useful HTML on initial request. HTTP-first saves money and runs faster.
Set appropriate depth limits: Use maxPages and maxDepth to control crawl scope and stay within budget.
Use domain filtering: Enable sameDomainOnly to prevent crawling into unrelated domains.
Adjust timeout settings: Increase requestTimeoutSecs for slow or distant servers.
Enable resource blocking: Keep blockResources set to true to skip heavy browser resources.
Monitor fallback rates: Check the run summary to see how many pages needed browser fallback.
Test stealth options: Use stealthBrowser: "cloak" if standard Playwright gets blocked.
Respect robots.txt: Review website policies before scraping.

Limitations

Browser mode is more expensive than HTTP mode.
Some websites require authentication, session warmup, or custom logic.
Very aggressive protection systems may still throttle or block requests.
CloakBrowser requires binary download at runtime if not preinstalled.
External links can be extracted, but crawl expansion stays focused on start domains by default.

Is it legal to scrape websites?

Scraping is legal in many jurisdictions, but you still need to follow applicable laws and website policies.

Respect robots.txt: Check the website's robots.txt file and follow its rules where appropriate.
Review Terms of Service: Some sites explicitly prohibit scraping in their terms.
Protect personal data: Personal data may be protected by GDPR and similar laws. Only scrape it when you have a lawful basis.
Do not overload servers: Use appropriate concurrency and crawl limits.
Respect copyright: Do not republish copyrighted content without permission.

If you are unsure whether scraping a specific website is legal for your use case, consult a lawyer. For more information, read Is web scraping legal?.

Stealth Scraper

shvmgrx/stealth-scraper

Shivam Goraksha

Smart Page Fetcher — HTML, Markdown & Text

shelvick/smart-page-fetcher

Fetch a batch of URLs and get the page as HTML, Markdown, or clean text. Tries plain HTTP first, renders JavaScript in a real browser when needed, and escalates to stealth + residential proxy for Cloudflare-protected, bot-defended pages, per URL. Pay only for the difficulty each URL needed.

Scott Helvick

Best Web Scraper API

crawlkit/crawlkit-scrape-api

Scrape any website and get clean markdown, HTML, metadata and links. Powered by CrawlKit.sh - supports stealth mode for anti-bot protection.

Crawlkit

LinkedIn Profile Scraper — Stealth Data Extraction for Sales...

apricot_blackberry/linkedin-stealth-scraper

LinkedIn data at scale without getting flagged. Company profiles, employees, job listings — stealth extraction for B2B teams.

Creator Fusion

Website Content Scraper: Clean Markdown for AI and RAG

scrapemint/website-content-scraper

Crawl any website and get clean markdown, text, or HTML per page, ready for RAG pipelines, chatbots, and LLM fine tuning. Plain HTTP, no browser, no API key. Pay per page.

Ken M

Web Page to Markdown Extractor

fetch_cat/web-page-to-markdown-extractor

Convert public URLs into clean Markdown, text, metadata, links, images, and optional HTML for AI and automation workflows.

Hanna Nosova

Facebook Posts Scraper

khadinakbar/facebook-posts-scraper

Scrape public Facebook page, profile, group, and post URLs into structured post records with text, media, links, and engagement counts. HTTP-first with optional provider fallback. MCP-ready.

Khadin Akbar

Website Content Crawler — Text, Markdown & HTML for AI/LLM

hichemdev/website-content-crawler

Crawl any website and extract clean text, Markdown, and HTML from every page — ready for LLM, RAG, and AI ingestion.

Hichem Ben Moussa

AI Web Crawler

hounderd/ai-web-crawler

Crawl websites and extract clean, LLM-ready markdown content with stealth browser rendering, anti-bot hardening, smart content filtering, and structured metadata extraction. Built for RAG pipelines, AI agents, and data workflows.