Stealth Website Crawler avatar

Stealth Website Crawler

Pricing

Pay per usage

Go to Apify Store
Stealth Website Crawler

Stealth Website Crawler

Crawl websites protected by Cloudflare, DataDome, and other anti-bot systems. Extract clean text or markdown for AI/LLM pipelines. Like Website Content Crawler, but for sites that block you.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Nocturne

Nocturne

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

1

Monthly active users

3 days ago

Last modified

Share

Stealth Website Crawler

What does Stealth Website Crawler do?

Stealth Website Crawler extracts content from websites that block standard scrapers. It uses a binary-patched Chromium browser (Patchright) running in headed mode on a virtual display with full fingerprint spoofing, human behavior simulation, and container marker hiding.

If Website Content Crawler fails on a site because of Cloudflare, DataDome, or other anti-bot protection, this actor is the solution.

  • Extract clean markdown, HTML, or plain text from any web page
  • Bypass Cloudflare, DataDome, Akamai, PerimeterX, and other anti-bot systems
  • Take screenshots of any page, element, or full page
  • Run custom JavaScript on protected pages
  • Perform interactive actions (click, fill, scroll, hover) without writing code
  • Capture network responses (intercept JSON/API calls the page makes)
  • Crawl entire sites with link following, URL filters, and depth control
  • Extract metadata (title, description, author, canonical URL, language, Open Graph)

What can it bypass?

ProtectionStatus
Cloudflare Turnstile & ChallengeBypassed
DataDomeBypassed
AkamaiBypassed
PerimeterX / HUMANBypassed
Fingerprint.comBypassed
CreepJSBypassed
PixelscanBypassed
navigator.webdriver detectionBypassed
CDP (Chrome DevTools Protocol) detectionBypassed
Canvas fingerprintingSession-stable noise
AudioContext fingerprintingSession-stable noise
WebGL renderer detectionSpoofed

How to use Stealth Website Crawler

  1. Create a free Apify account
  2. Open Stealth Website Crawler in Apify Console
  3. Enter the URLs you want to crawl
  4. Click Start and wait for the run to finish
  5. Download your data as JSON, CSV, Excel, or use the API

What data can you extract?

FieldDescription
urlOriginal URL requested
loadedUrlFinal URL after redirects
titlePage title
contentClean extracted content (markdown, HTML, or text)
contentLengthLength of extracted content
statusCodeHTTP status code
metadata.descriptionMeta description or Open Graph description
metadata.authorAuthor meta tag
metadata.keywordsKeywords meta tag
metadata.canonicalUrlCanonical URL
metadata.languageCodeLanguage from HTML lang attribute
metadata.ogTitleOpen Graph title
metadata.ogImageOpen Graph image URL
crawl.depthHow many links deep from the start URL
crawl.referrerUrlThe page that linked to this one
screenshotKeyKey-value store key for screenshot (if enabled)
scrapedAtISO timestamp of when the page was scraped

Two modes of operation

Crawl mode (default)

Give it URLs and it crawls pages automatically, following same-domain links. No code required.

Example input:

{
"startUrls": [{ "url": "https://cloudflare-protected-site.com" }],
"maxCrawlPages": 50,
"outputFormat": "markdown",
"followLinks": true,
"maxCrawlDepth": 3
}

Example output:

{
"url": "https://example.com/about",
"loadedUrl": "https://example.com/about",
"title": "About Us - Example Company",
"content": "# About Us\n\nExample Company was founded in 2020...\n\n## Our Mission\n\nWe believe in making data accessible...",
"contentLength": 4523,
"statusCode": 200,
"metadata": {
"description": "Learn about Example Company, our mission, and our team.",
"author": "Example Company",
"keywords": "about, company, mission",
"canonicalUrl": "https://example.com/about",
"languageCode": "en",
"ogTitle": "About Us - Example Company",
"ogImage": "https://example.com/images/team.jpg"
},
"crawl": {
"depth": 1,
"referrerUrl": "https://example.com"
},
"scrapedAt": "2026-03-20T09:30:00.000Z"
}

Interactive mode

Provide an actions array to click buttons, fill forms, take screenshots, run JavaScript, and extract specific data. Actions are executed in order on each URL.

Example: Search and extract results

{
"startUrls": [{ "url": "https://protected-site.com/search" }],
"actions": [
{ "type": "fill", "selector": "#search-input", "value": "web scraping" },
{ "type": "click", "selector": "button[type=submit]" },
{ "type": "wait", "selector": ".results" },
{ "type": "screenshot", "fullPage": true },
{ "type": "extractContent", "format": "markdown" }
]
}

Example: Extract rendered JavaScript content

{
"startUrls": [{ "url": "https://spa-app.com/dashboard" }],
"initialCookies": [{ "name": "session", "value": "abc123", "domain": "spa-app.com", "path": "/" }],
"actions": [
{ "type": "wait", "selector": ".data-loaded" },
{ "type": "javascript", "expression": "JSON.stringify([...document.querySelectorAll('.item')].map(el => ({name: el.querySelector('.name').textContent, price: el.querySelector('.price').textContent})))" },
{ "type": "screenshot" }
]
}

Example: Capture API responses from the page

{
"startUrls": [{ "url": "https://site-with-internal-api.com" }],
"captureNetwork": true,
"actions": [
{ "type": "scroll", "pages": 3 },
{ "type": "captureNetwork" }
]
}

Available actions

ActionParametersWhat it does
clickselectorClick an element
fillselector, valueSet input value instantly
typeselector, valueType with human-like keystroke delays
scrollpages (default 1)Scroll with realistic behavior
hoverselectorHover over element
selectselector, valueSelect dropdown option
waitselector / time (ms) / navigationWait for condition
screenshotfullPage, selector, keyScreenshot to key-value store
javascriptexpressionRun JS and return result
extractHtmlselector (optional)Get rendered DOM HTML
extractContentformat (markdown/html/text)Clean content extraction
captureNetworkReturn intercepted JSON/API responses
humanActivitySimulate scroll + Bezier mouse movement
mouseMovex, yBezier curve mouse movement

How can I use the scraped data?

  • AI and LLM pipelines: Feed content from anti-bot sites into RAG pipelines, vector databases, or LLM fine-tuning. Works with LangChain, LlamaIndex, and other frameworks.
  • Competitor monitoring: Track content changes on Cloudflare-protected competitor websites.
  • Price tracking: Extract product prices from e-commerce sites with aggressive anti-bot (Nike, Amazon, Booking.com).
  • Lead generation: Scrape business directories and review sites behind anti-bot protection.
  • SEO research: Crawl competitor sites to analyze content structure, metadata, and internal linking.
  • Academic research: Extract data from government portals, news sites, and academic databases behind Cloudflare.
  • Brand monitoring: Track mentions and content on social platforms with browser fingerprinting.
  • Market research: Scrape real estate listings, job postings, and travel prices from protected sites.

Input configuration

FieldDefaultDescription
startUrlsrequiredURLs to process
actionsnoneActions array (enables interactive mode)
maxCrawlPages10Max pages to crawl
maxCrawlDepth20Max link depth from start URLs
outputFormatmarkdownContent format: markdown, html, or text
followLinkstrueFollow same-domain links
includeUrlGlobsnoneOnly crawl URLs matching these glob patterns
excludeUrlGlobsnoneSkip URLs matching these glob patterns
waitForSelectornoneCSS selector to wait for before extraction
takeScreenshotsfalseScreenshot each page to key-value store
blockResourcesfalseBlock images/fonts/CSS to save proxy bandwidth
initialCookiesnoneCookies for authenticated scraping [{name, value, domain, path}]
captureNetworkfalseRecord JSON/API responses made by the page
maxConcurrency3Concurrent pages (lower = safer stealth, higher = faster)
requestTimeoutSecs30Page load timeout in seconds
maxRetries1Retry attempts for failed pages
proxyConfigApify residentialProxy settings (residential strongly recommended)

How it compares to Website Content Crawler

FeatureWebsite Content CrawlerStealth Website Crawler
Anti-bot bypassJS-level fingerprintsBinary-patched Chromium + full stealth stack
Cloudflare sitesOften blockedWorks
DataDome / Akamai sitesUsually blockedWorks
Browser modeHeadlessHeaded on virtual display (more stealthy)
ScreenshotsFirefox onlyAny page, element, or full page
Custom JS executionPage function (code required)javascript action (no code needed)
Interactive actionsClick to expand only15 action types with no code
Network response captureNoIntercepts JSON/API responses

| Human behavior simulation | No | Bezier curve mouse, variable typing, random scrolling | | Content extraction | 5 HTML transformers | Readability + Markdown/HTML/Text | | Metadata extraction | Title, description, language | Title, description, author, keywords, canonical, language, OG tags | | Pricing | Free + platform usage | Free + platform usage |

Integrations

You can connect Stealth Website Crawler with your existing tools and workflows:

  • API: Run the actor programmatically via the Apify API using Node.js or Python clients
  • Webhooks: Get notified when a run finishes
  • Make (Integromat): Automate workflows with scraped data
  • Zapier: Connect to 5,000+ apps
  • Google Sheets: Export results directly to spreadsheets
  • Slack: Send notifications about scraping results
  • GitHub: Trigger runs from CI/CD pipelines
  • Airbyte: Sync data to databases and warehouses

Use via API

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("nocturne/stealth-website-crawler").call(run_input={
"startUrls": [{"url": "https://cloudflare-protected-site.com"}],
"maxCrawlPages": 10,
"outputFormat": "markdown",
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item["title"], item["url"])
print(item["content"][:200])
import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('nocturne/stealth-website-crawler').call({
startUrls: [{ url: 'https://cloudflare-protected-site.com' }],
maxCrawlPages: 10,
outputFormat: 'markdown',
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach(item => console.log(item.title, item.url));

Frequently asked questions

How is this different from Website Content Crawler? Website Content Crawler uses standard Playwright with JavaScript-level fingerprint injection. Stealth Website Crawler uses Patchright, which patches Chromium at the binary level to remove automation detection leaks. It also runs in headed mode on a virtual display (Xvfb) with canvas/audio/WebGL fingerprint noise, human behavior simulation, and container marker hiding. It works on sites where WCC gets blocked.

Do I need residential proxies? Strongly recommended. The actor defaults to Apify residential proxies. Datacenter IPs are blocked by most anti-bot systems regardless of how good the browser stealth is. Residential proxies add ~$0.002/page in bandwidth cost.

Can it solve CAPTCHAs? The stealth browser avoids triggering CAPTCHAs in most cases by appearing as a real user. If a CAPTCHA does appear, the page is automatically retried with a new proxy IP. Explicit CAPTCHA solving (reCAPTCHA, hCaptcha) is not currently included.

Does it work on login-required pages? You can pass initialCookies for authenticated sessions. Use your browser's developer tools to copy session cookies, then pass them as input. The actor does not handle login flows (username/password entry) automatically, but you can use interactive mode with fill and click actions to automate login.

How much does it cost? The actor itself is free. You pay only for Apify platform usage (compute and proxy bandwidth). A typical crawl of 1,000 pages costs approximately $3-8 depending on page complexity and proxy usage. Check the Apify pricing page for details.

Can I use it with Make, Zapier, or other integrations? Yes. The actor works with all standard Apify integrations including Make, Zapier, Slack, Google Sheets, webhooks, and the Apify API (Node.js and Python clients).

Can I use it via the API? Yes. You can run the actor programmatically using the Apify API, the Python client, or the Node.js client. See the API usage examples above.

What's the success rate? Varies by target site and protection system. Typical success rates with residential proxies:

  • Cloudflare-protected sites: 90-98%
  • DataDome sites: 85-95%
  • Akamai sites: 85-95%
  • PerimeterX sites: 80-90%

Can I scrape JavaScript-rendered (SPA) pages? Yes. The actor runs a full Chromium browser that renders JavaScript completely before extracting content. Use waitForSelector to wait for dynamic content to load, or use javascript actions to extract data from the rendered DOM.

Scraping publicly available data is generally considered legal based on the US Ninth Circuit Court ruling (hiQ Labs v. LinkedIn). However:

  • Always respect the website's Terms of Service
  • Do not scrape personal data without a lawful basis under GDPR/CCPA
  • Do not overload target servers with excessive request rates
  • Consider using the maxConcurrency setting to limit parallel requests

We recommend consulting a legal professional if you have questions about scraping specific websites. Read Apify's blog post on the legality of web scraping for more context.

Tips for best results

  • Use residential proxies: Always use Apify residential proxies for anti-bot sites. Datacenter IPs are detected and blocked regardless of browser stealth.
  • Lower concurrency for harder sites: Set maxConcurrency to 1-2 for sites with aggressive anti-bot. Higher concurrency increases detection risk.
  • Use waitForSelector: For JavaScript-heavy sites, specify a CSS selector that appears only after the content loads.
  • Block resources to save bandwidth: Enable blockResources if you don't need images, fonts, or CSS. This reduces residential proxy costs significantly.
  • Use include/exclude globs: Focus your crawl on relevant pages. For example, includeUrlGlobs: ["https://example.com/blog/*"] avoids crawling unrelated sections.

Feedback and support

If you encounter any issues or have suggestions, please open an issue in the Issues tab. We actively monitor and respond to all reports.

Found a bug? Have a feature request? We want to hear from you.