Stealth Website Scraper
Pricing
from $1.50 / 1,000 results
Stealth Website Scraper
Extract text, links, metadata, HTML, markdown, and structured page data with HTTP-first crawling and stealth-aware browser fallback.
Pricing
from $1.50 / 1,000 results
Rating
0.0
(0)
Developer
Solutions Smart
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
17 hours ago
Last modified
Categories
Share
Stealth Website Scraper extracts text, links, metadata, HTML, markdown, and structured page data from websites using a fast HTTP-first crawl with browser fallback when plain requests are not enough.
It is designed for production scraping and analysis workflows where cost, speed, and reliability all matter. The actor starts with lightweight HTTP crawling through CheerioCrawler, then falls back to a browser flow when the site blocks requests, returns thin content, or depends on JavaScript rendering.
What does Stealth Website Scraper do?
Stealth Website Scraper enables you to extract data from websites with intelligent fallback handling. Whether you are dealing with server-rendered content or JavaScript-heavy pages, this actor adapts its approach to maximize success while minimizing costs.
Stealth Website Scraper can extract:
- Clean text content and markdown from web pages
- Page metadata such as title, description, and canonical URL
- Headings such as H1 and H2
- Internal and external links
- HTML source code
- HTTP status codes and content type information
- Crawl depth and crawl method information
Why scrape websites?
Websites contain publicly available data that can support AI pipelines, market research, competitive analysis, and business intelligence.
- AI and RAG pipelines: Feed clean text content into machine learning models and retrieval-augmented generation systems.
- Business intelligence: Extract metadata, pricing, and product information from competitor websites.
- Content extraction: Build datasets for training, analysis, and enrichment workflows.
- Testing and QA: Verify site rendering across different network conditions and browser types.
- Market research: Gather structured data from public websites at scale.
- Fingerprint testing: Compare stealth browser behavior against standard browser automation.
How to scrape websites with Stealth Website Scraper
- Click Try for free to open the actor.
- Enter one or more Start URLs.
- Configure optional settings:
- Max Pages: Limit the number of pages to crawl.
- Max Depth: Control how deep internal link crawling goes.
- Crawling Mode: Choose between
http-firstandbrowser-only. - Stealth Browser: Select
cloakto attempt CloakBrowser, orplaywrightfor standard Playwright. - Extraction Mode: Choose what data to extract such as
all,text,markdown,html, orlinks.
- Click Run.
- When the run completes, preview or download your data from the Dataset tab.
How much will it cost to scrape websites?
Apify gives you $5 in free usage credits every month on the Apify Free plan. Since HTTP scraping is much cheaper than browser automation, you can extract many pages for low cost by relying on the HTTP-first strategy.
The Stealth Website Scraper uses HTTP requests first whenever possible. This means:
- Lower compute costs
- Faster execution
- Higher throughput
- Less browser overhead
Browser fallback only engages when necessary, keeping costs down while maintaining reliability.
For regular large-scale scraping, review current Apify pricing and set maxPages, maxDepth, and concurrency to match your budget.
Results
Each processed page produces one clean dataset item.
Example output
{"url": "https://example.com","loadedUrl": "https://example.com/","domain": "example.com","title": "Example Domain","metaDescription": "A reserved-use domain in DNS","canonicalUrl": "https://example.com/","h1": ["Example Domain"],"h2": ["More information"],"text": "Example Domain This domain is for use in illustrative examples in documents...","markdown": "# Example Domain\n\nThis domain is for use in illustrative examples in documents...","html": "<!doctype html>\n<html>\n<head>...</head>...","links": ["https://example.com/more-info", "https://example.com/about"],"externalLinks": ["https://www.iana.org/"],"statusCode": 200,"contentType": "text/html; charset=UTF-8","depth": 0,"crawlMethod": "http","fallbackUsed": false,"fallbackReason": "","timestamp": "2026-05-17T21:00:00.000Z"}
The actor also stores a final summary in the key-value store under OUTPUT:
{"pagesScraped": 25,"httpPages": 22,"browserPages": 3,"cloakPages": 0,"failedPages": 0,"fallbacks": 3,"uniqueUrlsQueued": 27,"startedAt": "2026-05-17T21:00:00.000Z","finishedAt": "2026-05-17T21:05:30.000Z","durationSeconds": 330}
HTTP-first vs. browser mode
HTTP-first mode
The actor starts with lightweight HTTP requests using CheerioCrawler. This is the fastest and cheapest approach.
Browser fallback triggers when:
- The site returns
403,429, or503 - The HTTP response body is empty or below
minTextLengthForSuccess - The page appears JavaScript-heavy
- Text extraction returns minimal content
Browser-only mode
Skip HTTP entirely and crawl exclusively with a browser. Useful for:
- JavaScript-heavy single-page applications
- Sites with stronger bot protection
- Pages requiring browser rendering
Stealth browser options
cloak: Attempts CloakBrowser, a fingerprint-aware Chromium fork with source-level stealth patches. Falls back to standard Playwright if unavailable.playwright: Uses standard Playwright Chromium.
Input parameters
Essential parameters
startUrls: Array of URLs or objects with aurlproperty. Required.maxPages: Maximum pages to scrape. Default:100.maxDepth: Maximum link depth for crawling. Default:2.crawlingMode:http-firstorbrowser-only. Default:http-first.
Crawling options
scrapeInternalLinks: Enable internal link crawling. Default:true.sameDomainOnly: Limit crawling to the start domain. Default:true.maxConcurrency: Concurrent request limit. Default:5.requestTimeoutSecs: Request timeout in seconds. Default:30.
Extraction options
extractionMode:all,text,markdown,html, orlinks. Default:all.includeHtml: Include full HTML source. Default:false.includeLinks: Extract internal links. Default:true.includeExternalLinks: Extract external links. Default:false.
Browser options
stealthBrowser:cloakorplaywright. Default:cloak.waitUntil:domcontentloaded,load, ornetworkidle. Default:domcontentloaded.blockResources: Block images, fonts, media, and stylesheets to speed up rendering. Default:true.fallbackOnStatusCodes: Status codes that trigger browser fallback. Default:[403, 429, 503].minTextLengthForSuccess: Minimum text length to avoid fallback. Default:300.
Proxy and headers
proxyConfiguration: Proxy setup, including Apify Proxy.customUserAgent: Custom User-Agent header.
Example input
{"startUrls": [{ "url": "https://example.com" },"https://example.org/docs"],"maxPages": 50,"maxDepth": 2,"sameDomainOnly": true,"scrapeInternalLinks": true,"extractionMode": "all","crawlingMode": "http-first","stealthBrowser": "cloak","fallbackOnStatusCodes": [403, 429, 503],"minTextLengthForSuccess": 300,"waitUntil": "domcontentloaded","blockResources": true,"includeHtml": false,"includeLinks": true,"includeExternalLinks": false,"maxConcurrency": 5,"requestTimeoutSecs": 30,"proxyConfiguration": {"useApifyProxy": true}}
Tips for scraping websites
- Start with HTTP mode: Most websites serve useful HTML on initial request. HTTP-first saves money and runs faster.
- Set appropriate depth limits: Use
maxPagesandmaxDepthto control crawl scope and stay within budget. - Use domain filtering: Enable
sameDomainOnlyto prevent crawling into unrelated domains. - Adjust timeout settings: Increase
requestTimeoutSecsfor slow or distant servers. - Enable resource blocking: Keep
blockResourcesset totrueto skip heavy browser resources. - Monitor fallback rates: Check the run summary to see how many pages needed browser fallback.
- Test stealth options: Use
stealthBrowser: "cloak"if standard Playwright gets blocked. - Respect robots.txt: Review website policies before scraping.
Limitations
- Browser mode is more expensive than HTTP mode.
- Some websites require authentication, session warmup, or custom logic.
- Very aggressive protection systems may still throttle or block requests.
- CloakBrowser requires binary download at runtime if not preinstalled.
- External links can be extracted, but crawl expansion stays focused on start domains by default.
Is it legal to scrape websites?
Scraping is legal in many jurisdictions, but you still need to follow applicable laws and website policies.
- Respect robots.txt: Check the website's
robots.txtfile and follow its rules where appropriate. - Review Terms of Service: Some sites explicitly prohibit scraping in their terms.
- Protect personal data: Personal data may be protected by GDPR and similar laws. Only scrape it when you have a lawful basis.
- Do not overload servers: Use appropriate concurrency and crawl limits.
- Respect copyright: Do not republish copyrighted content without permission.
If you are unsure whether scraping a specific website is legal for your use case, consult a lawyer. For more information, read Is web scraping legal?.