# Stealth Web Scraper (`lentic_clockss/stealth-web-scraper`) Actor

Scrape websites protected by Cloudflare, Turnstile, and other anti-bot systems. Built-in proxy rotation, browser fingerprint protection, and CAPTCHA solving. No extra configuration needed.

- **URL**: https://apify.com/lentic\_clockss/stealth-web-scraper.md
- **Developed by:** [kane liu](https://apify.com/lentic_clockss) (community)
- **Categories:** Developer tools, Lead generation
- **Stats:** 5 total users, 4 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Stealth Web Scraper

Scrape any website — even those protected by Cloudflare, Turnstile CAPTCHAs, and other anti-bot systems. Built-in proxy rotation, browser fingerprint protection, and human behavior simulation. No extra setup needed.

### Why this scraper?

Most web scrapers fail on protected websites. Cloudflare alone protects **20% of all websites** (41M+ sites). This Actor uses a real browser engine with anti-detection technology to bypass these protections automatically.

| Protection Level | Examples | Success Rate |
|-----------------|----------|-------------|
| No protection | Wikipedia, static sites | 100% |
| Basic (JS challenge) | Many e-commerce sites | 100% |
| Medium (Cloudflare standard) | Fiverr, job boards | 86-100% |
| Strong (CF Turnstile CAPTCHA) | Clutch, Upwork | 100% |

### Features

- **Anti-bot bypass** — Handles Cloudflare challenges, Turnstile CAPTCHAs, and browser fingerprint detection automatically
- **Residential proxy rotation** — Built-in proxy pool with intelligent rotation, cooldown, and blacklist management
- **Browser fingerprint masking** — Uses Patchright (undetectable Playwright fork) with stealth configuration
- **Human behavior simulation** — Random delays, mouse movements, and page interaction patterns
- **CSS selector extraction** — Optionally extract specific data fields using CSS selectors
- **Flexible output** — Get full HTML, plain text, or both for each page
- **Concurrent scraping** — Process up to 5 pages in parallel
- **Automatic retries** — 3 retry attempts per URL with proxy rotation on failure

### How it works

1. Takes your list of URLs
2. Opens each URL in an undetectable browser with a rotating residential proxy
3. Detects and resolves any anti-bot challenges (Cloudflare, Turnstile, etc.)
4. Simulates human browsing behavior
5. Extracts and returns the page content

No API keys, no proxy configuration, no browser setup — it just works.

### Input Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| urls | array | Yes | List of URLs to scrape |
| extractSelectors | object | No | CSS selectors to extract specific fields (e.g. `{"title": "h1", "price": ".price"}`) |
| outputFormat | string | No | `html`, `text`, or `both` (default: `both`) |
| waitForSelector | string | No | CSS selector to wait for before extracting (for JS-rendered pages) |
| maxConcurrency | integer | No | Parallel pages (1-5, default: 1) |
| pageTimeout | integer | No | Page load timeout in seconds (30-300, default: 90) |

### Output Fields

| Field | Type | Description |
|-------|------|-------------|
| url | string | The scraped URL |
| statusCode | integer | HTTP status code (200 = success) |
| blocked | boolean | Whether anti-bot protection blocked the request |
| title | string | Page title |
| html | string | Full page HTML (if outputFormat includes html) |
| text | string | Plain text content (if outputFormat includes text) |
| extracted | object | Extracted fields (if extractSelectors provided) |
| scrapedAt | string | ISO timestamp of when the page was scraped |

### Example: Basic scraping

**Input:**
```json
{
  "urls": [
    "https://www.clutch.co/it-services",
    "https://www.fiverr.com/categories/programming-tech"
  ],
  "outputFormat": "text"
}
````

**Output:**

```json
{
  "url": "https://www.clutch.co/it-services",
  "statusCode": 200,
  "blocked": false,
  "title": "Top IT Services Companies - 2026 Reviews | Clutch.co",
  "text": "Top IT Services Companies...",
  "scrapedAt": "2026-04-02T10:30:00.000Z"
}
```

### Example: Extract specific fields

**Input:**

```json
{
  "urls": ["https://example-shop.com/product/123"],
  "extractSelectors": {
    "productName": "h1.product-title",
    "price": ".price-current",
    "description": ".product-description",
    "reviews": ".review-text"
  },
  "outputFormat": "text"
}
```

**Output:**

```json
{
  "url": "https://example-shop.com/product/123",
  "statusCode": 200,
  "blocked": false,
  "title": "Product Name - Example Shop",
  "text": "...",
  "extracted": {
    "productName": "Wireless Headphones Pro",
    "price": "$79.99",
    "description": "Premium noise-cancelling...",
    "reviews": ["Great sound quality!", "Best purchase ever"]
  },
  "scrapedAt": "2026-04-02T10:30:00.000Z"
}
```

### Use Cases

- **Price monitoring** — Track product prices on e-commerce sites with anti-bot protection
- **Lead generation** — Scrape business directories and company listings
- **Market research** — Collect data from competitor websites
- **Content aggregation** — Gather articles, reviews, or listings from multiple sources
- **SEO monitoring** — Check search result pages and competitor content
- **Real estate data** — Scrape property listings from protected platforms
- **Job market analysis** — Collect job postings from protected job boards

### Performance

- \~8-12 seconds per page (with anti-bot handling)
- \~5-7 seconds per page (unprotected sites)
- Concurrent mode: up to 5x faster with maxConcurrency setting
- 50 pages: ~5-10 minutes
- 200 pages: ~20-35 minutes

### Compared to other scrapers

| Feature | Free Apify Scrapers | Other Anti-Bot Actors | **Stealth Web Scraper** |
|---------|--------------------|-----------------------|------------------------|
| Unprotected sites | Yes | Yes | **Yes** |
| Cloudflare bypass | No | Claims yes (rated 1-3/5) | **Yes (verified 86-100%)** |
| Turnstile CAPTCHA | No | Unreliable | **Yes (verified 100%)** |
| Built-in proxies | No | Some | **Yes (residential)** |
| Browser fingerprint | Basic | Basic | **Advanced (Patchright)** |
| PPE pricing | Free + compute | Mostly rental (dying) | **Pay per page** |
| Published success rates | No | No | **Yes, per-site verified** |

### Proxy

This Actor uses its own residential proxy pool. No additional proxy configuration is needed.

### FAQ

**Q: Can it scrape any website?**
A: It handles most websites including those with Cloudflare, Turnstile, and JavaScript-heavy rendering. Extremely aggressive anti-bot systems (like some banking sites) may still block requests.

**Q: Do I need to provide my own proxies?**
A: No. Residential proxies are built-in and included in the price.

**Q: How is this different from Apify's Web Scraper?**
A: Apify's official Web Scraper has no anti-bot bypass capability. It fails on Cloudflare-protected sites. This Actor uses an undetectable browser engine specifically designed to bypass anti-bot protections.

**Q: Can I extract specific data from pages?**
A: Yes. Use the `extractSelectors` parameter with CSS selectors to extract specific fields. For complex extraction logic, you can process the returned HTML in your own code.

**Q: What about JavaScript-rendered pages?**
A: The Actor uses a full browser engine, so JavaScript is fully executed. Use `waitForSelector` if you need to wait for specific dynamic content to load.

# Actor input Schema

## `urls` (type: `array`):

List of URLs to scrape. Each URL will be visited with full anti-bot protection.

## `extractSelectors` (type: `object`):

Optional key-value pairs of CSS selectors. Example: {"title": "h1", "price": ".product-price", "links": "a\[href]"}. Leave empty to get full page HTML/text.

## `outputFormat` (type: `string`):

What content to return for each page

## `waitForSelector` (type: `string`):

Optional CSS selector to wait for before extracting content. Useful for JavaScript-rendered pages.

## `maxConcurrency` (type: `integer`):

Number of pages to scrape in parallel. Higher values are faster but use more resources.

## `pageTimeout` (type: `integer`):

Maximum time to wait for each page to load

## Actor input object example

```json
{
  "urls": [
    "https://www.example.com"
  ],
  "extractSelectors": {},
  "outputFormat": "both",
  "maxConcurrency": 1,
  "pageTimeout": 90
}
```

# Actor output Schema

## `scrapedPages` (type: `string`):

Dataset containing all scraped page results

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "urls": [
        "https://www.example.com"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("lentic_clockss/stealth-web-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "urls": ["https://www.example.com"] }

# Run the Actor and wait for it to finish
run = client.actor("lentic_clockss/stealth-web-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "urls": [
    "https://www.example.com"
  ]
}' |
apify call lentic_clockss/stealth-web-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=lentic_clockss/stealth-web-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Stealth Web Scraper",
        "description": "Scrape websites protected by Cloudflare, Turnstile, and other anti-bot systems. Built-in proxy rotation, browser fingerprint protection, and CAPTCHA solving. No extra configuration needed.",
        "version": "0.1",
        "x-build-id": "BVsvdu6b9jFrtrfPX"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/lentic_clockss~stealth-web-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-lentic_clockss-stealth-web-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/lentic_clockss~stealth-web-scraper/runs": {
            "post": {
                "operationId": "runs-sync-lentic_clockss-stealth-web-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/lentic_clockss~stealth-web-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-lentic_clockss-stealth-web-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "urls"
                ],
                "properties": {
                    "urls": {
                        "title": "URLs to scrape",
                        "type": "array",
                        "description": "List of URLs to scrape. Each URL will be visited with full anti-bot protection.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "extractSelectors": {
                        "title": "CSS selectors to extract",
                        "type": "object",
                        "description": "Optional key-value pairs of CSS selectors. Example: {\"title\": \"h1\", \"price\": \".product-price\", \"links\": \"a[href]\"}. Leave empty to get full page HTML/text.",
                        "default": {}
                    },
                    "outputFormat": {
                        "title": "Output format",
                        "enum": [
                            "html",
                            "text",
                            "both"
                        ],
                        "type": "string",
                        "description": "What content to return for each page",
                        "default": "both"
                    },
                    "waitForSelector": {
                        "title": "Wait for selector",
                        "type": "string",
                        "description": "Optional CSS selector to wait for before extracting content. Useful for JavaScript-rendered pages."
                    },
                    "maxConcurrency": {
                        "title": "Max concurrency",
                        "minimum": 1,
                        "maximum": 5,
                        "type": "integer",
                        "description": "Number of pages to scrape in parallel. Higher values are faster but use more resources.",
                        "default": 1
                    },
                    "pageTimeout": {
                        "title": "Page timeout (seconds)",
                        "minimum": 30,
                        "maximum": 300,
                        "type": "integer",
                        "description": "Maximum time to wait for each page to load",
                        "default": 90
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
