# Xcrawl Search Scrape Actor (`empathetic_chorus/xcrawl-search-scrape-actor`) Actor

- **URL**: https://apify.com/empathetic\_chorus/xcrawl-search-scrape-actor.md
- **Developed by:** [Charles](https://apify.com/empathetic_chorus) (community)
- **Categories:** Automation, Developer tools, Lead generation
- **Stats:** 2 total users, 1 monthly users, 0.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## XCrawl Web Search & Scrape â€” Apify Actor

Search the web and scrape any URL using [XCrawl](https://xcrawl.com)'s residential proxy network. Bypass anti-bot systems with automatic JS rendering fallback and global IP rotation.

**Actor:** `yanxvdong123/xcrawl-search-scrape` | **Runtime:** Node.js 22 | **License:** MIT

---

### ðŸš€ Quick Start

1. Open the [Actor Console](https://console.apify.com/actors/yanxvdong123~xcrawl-search-scrape)
2. Set `XCRAWL_API_KEY` in **Environment Variables** (get a free key at [dash.xcrawl.com](https://dash.xcrawl.com))
3. Choose **Search** or **Scrape** mode, fill in the inputs
4. Hit **Run**

No credit card needed â€” XCrawl gives free trial credits on signup.

---

### ðŸ“‹ Input Parameters

#### Search Mode (`action: "search"`)

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `query` | string | **required** | Web search query (max 200 chars) |
| `limit` | integer | `10` | Number of results (1â€“50) |
| `location` | string | `"US"` | Geo-location code (`US`, `UK`, `CN`, `JP`, `DE`, etc.) |
| `language` | string | `"en"` | Search language (`en`, `zh`, `ja`, `fr`, etc.) |
| `withContent` | boolean | `true` | Fetch full page content for each result |
| `render` | boolean | `false` | JS rendering for anti-bot bypass |
| `formats` | string | `"markdown,summary"` | Output formats: comma-separated (`markdown`, `summary`, `html`) |
| `screenshot` | boolean | `false` | Capture page screenshot (requires `render=true`) |

#### Scrape Mode (`action: "scrape"`)

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `url` | string | **required** | Single URL to scrape (max 2000 chars) |
| `render` | boolean | `false` | JS rendering for anti-bot bypass |
| `formats` | string | `"markdown,summary"` | Output formats |
| `screenshot` | boolean | `false` | Capture screenshot (requires `render=true`) |

---

### ðŸ§  Intelligent Anti-Block System

This actor is built to handle modern anti-bot systems out of the box:

- **Automatic block detection** â€” Heuristically checks for Cloudflare, DataDome, and other challenge pages (looks for captcha forms, browser verification, access denied messages)
- **Smart retry** â€” If a page appears blocked, automatically retries with headless browser rendering (Chromium via XCrawl's `jsRender`)
- **Concurrent crawling** â€” Uses `p-limit` to run up to 5 parallel scrapes (balanced for speed + reliability)
- **Global proxy pool** â€” Requests route through XCrawl's residential proxy network with configurable geo-location
- **Per-URL resilience** â€” Each URL gets at least 2 attempts; if both fail, the error is recorded per-entry without stopping the batch

#### When to enable `render`

âœ… **Turn ON** for: News sites with paywalls (Reuters, WSJ), sites behind Cloudflare/DataDome, JavaScript-heavy SPAs  
âŒ **Keep OFF** for: Simple HTML pages, blogs, documentation (faster and cheaper without rendering)

---

### ðŸ“¦ Output Format

Each result is pushed to the Apify dataset:

```json
{
  "title": "Page Title",
  "url": "https://example.com",
  "snippet": "Search result description",
  "markdown": "Full page content converted to markdown...",
  "summary": "AI-generated summary from XCrawl...",
  "scrapeStatus": "completed",
  "screenshot": "base64-encoded PNG (if enabled)",
  "credits": "0.5",
  "scrapeError": null
}
````

**Search mode** returns an **array** of enriched results.\
**Scrape mode** returns a single result object.

***

### ðŸ’° Usage & Pricing

| Mode | XCrawl Credits Consumed |
|------|------------------------|
| Search (1 query) | ~1 credit |
| Scrape (no render) | ~1â€“3 credits |
| Scrape (with render) | ~3â€“8 credits |
| Free trial | âœ… Included with XCrawl signup |

The **actor itself is free** to run on Apify â€” you only pay for XCrawl API credits consumed.

***

### ðŸ”§ Environment Variables

| Variable | Required | Description |
|----------|----------|-------------|
| `XCRAWL_API_KEY` | âœ… Yes | Your API key from [dash.xcrawl.com](https://dash.xcrawl.com). Sign up â†’ Dashboard â†’ API Keys |

***

### ðŸŽ¯ Use Cases

- **Content research** â€” Collect articles, blog posts, and documentation on any topic
- **Market intelligence** â€” Scrape competitor pricing, product listings, and reviews
- **SEO / SERP monitoring** â€” Track search rankings across different geo-locations
- **RAG / LLM pipelines** â€” Feed clean markdown content into vector databases or AI agents
- **E-commerce** â€” Monitor product catalogs with location-specific searches
- **News aggregation** â€” Gather articles from multiple sources with automatic paywall bypass

***

### ðŸ— Architecture

```
Apify Run
  â””â”€ src/main.js (entry point)
      â”œâ”€ XCrawl Search API  â†’  Get top results
      â”œâ”€ XCrawl Scrape API  â†’  Extract page content
      â”‚   â””â”€ p-limit (concurrency = 5)
      â”‚       â”œâ”€ Normal scrape (fast)
      â”‚       â””â”€ Retry with JS render (anti-bot fallback)
      â””â”€ Apify Dataset     â†  Push all results
```

***

### ðŸ“„ Links

- **Source code:** [GitHub](https://github.com/yanxvdong123/xcrawl-search-scrape-actor)
- **XCrawl Dashboard:** [dash.xcrawl.com](https://dash.xcrawl.com)
- **XCrawl API Docs:** [docs.xcrawl.com](https://docs.xcrawl.com)
- **Report issues:** [GitHub Issues](https://github.com/yanxvdong123/xcrawl-search-scrape-actor/issues)

# Actor input Schema

## `action` (type: `string`):

Action to perform: 'search' for web search, 'scrape' for single page extraction

## `query` (type: `string`):

Search query (required for action=search)

## `url` (type: `string`):

URL to scrape (required for action=scrape)

## `location` (type: `string`):

Search location code (e.g. 'US', 'UK', 'CN', 'JP')

## `language` (type: `string`):

Search language code (e.g. 'en', 'zh', 'ja')

## `limit` (type: `integer`):

Number of search results to return (1-50). Each result's full content will be fetched automatically.

## `withContent` (type: `boolean`):

When enabled, automatically fetches full page content (markdown + summary) for each search result.

## `formats` (type: `string`):

Comma-separated output formats. Options: markdown, summary, html. (e.g. 'markdown,summary')

## `render` (type: `boolean`):

Enable browser-based JS rendering to bypass DataDome/Cloudflare. Slower but more anti-bot effective. Auto-retries blocked pages with rendering.

## `screenshot` (type: `boolean`):

Capture a screenshot of each scraped page (requires render=true).

## Actor input object example

```json
{
  "action": "search",
  "location": "US",
  "language": "en",
  "limit": 10,
  "withContent": true,
  "formats": "markdown,summary",
  "render": false,
  "screenshot": false
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("empathetic_chorus/xcrawl-search-scrape-actor").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("empathetic_chorus/xcrawl-search-scrape-actor").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call empathetic_chorus/xcrawl-search-scrape-actor --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=empathetic_chorus/xcrawl-search-scrape-actor",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Xcrawl Search Scrape Actor",
        "version": "0.0",
        "x-build-id": "21j3kRlq20fX6gDUj"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/empathetic_chorus~xcrawl-search-scrape-actor/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-empathetic_chorus-xcrawl-search-scrape-actor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/empathetic_chorus~xcrawl-search-scrape-actor/runs": {
            "post": {
                "operationId": "runs-sync-empathetic_chorus-xcrawl-search-scrape-actor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/empathetic_chorus~xcrawl-search-scrape-actor/run-sync": {
            "post": {
                "operationId": "run-sync-empathetic_chorus-xcrawl-search-scrape-actor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "action"
                ],
                "properties": {
                    "action": {
                        "title": "Action",
                        "enum": [
                            "search",
                            "scrape"
                        ],
                        "type": "string",
                        "description": "Action to perform: 'search' for web search, 'scrape' for single page extraction",
                        "default": "search"
                    },
                    "query": {
                        "title": "Search Query",
                        "maxLength": 200,
                        "type": "string",
                        "description": "Search query (required for action=search)"
                    },
                    "url": {
                        "title": "URL",
                        "maxLength": 2000,
                        "type": "string",
                        "description": "URL to scrape (required for action=scrape)"
                    },
                    "location": {
                        "title": "Location",
                        "type": "string",
                        "description": "Search location code (e.g. 'US', 'UK', 'CN', 'JP')",
                        "default": "US"
                    },
                    "language": {
                        "title": "Language",
                        "type": "string",
                        "description": "Search language code (e.g. 'en', 'zh', 'ja')",
                        "default": "en"
                    },
                    "limit": {
                        "title": "Result Limit",
                        "minimum": 1,
                        "maximum": 50,
                        "type": "integer",
                        "description": "Number of search results to return (1-50). Each result's full content will be fetched automatically.",
                        "default": 10
                    },
                    "withContent": {
                        "title": "Fetch Full Content",
                        "type": "boolean",
                        "description": "When enabled, automatically fetches full page content (markdown + summary) for each search result.",
                        "default": true
                    },
                    "formats": {
                        "title": "Output Formats",
                        "type": "string",
                        "description": "Comma-separated output formats. Options: markdown, summary, html. (e.g. 'markdown,summary')",
                        "default": "markdown,summary"
                    },
                    "render": {
                        "title": "Browser Rendering (Anti-Detection)",
                        "type": "boolean",
                        "description": "Enable browser-based JS rendering to bypass DataDome/Cloudflare. Slower but more anti-bot effective. Auto-retries blocked pages with rendering.",
                        "default": false
                    },
                    "screenshot": {
                        "title": "Take Screenshot",
                        "type": "boolean",
                        "description": "Capture a screenshot of each scraped page (requires render=true).",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
