# Baidu Search Scraper (`lofomachines/baidu-scraper`) Actor

Scrapes Baidu search results with all major filters and pagination.

- **URL**: https://apify.com/lofomachines/baidu-scraper.md
- **Developed by:** [Lofomachines](https://apify.com/lofomachines) (community)
- **Categories:** Developer tools, SEO tools
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $1.80 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Baidu Search Scraper | Extract Baidu Search Results

Baidu Search Scraper is a powerful web scraping tool designed to extract search engine results pages (SERPs) from **Baidu** (China's leading search engine). This scraper is built to reliably bypass captchas, anti-scraping systems, and bot detection mechanisms.

Whether you need to collect search results for SEO monitoring, brand protection, sentiment analysis, academic research, or market intelligence, this Baidu Scraper handles the complexities of pagination.

---

### 🚀 Key Features

*   **Sequential Multi-Query Search**: Input multiple search terms (one per line) and process them in a single run. The scraper reuses the browser container for ultra-fast execution.
*   **Real Destination URL Resolution**: Baidu search results use encrypted redirect links (`baidu.com/link?url=...`). This scraper automatically follows redirects to extract and output the **real destination URLs**.
*   **Advanced Baidu Search Operators**: Fully supports filters such as:
    *   **Domain filtering** (`site:domain.com`) - supports multiple domains combined with OR.
    *   **Recency/Time range** (last 24 hours, week, month, year).
    *   **File type filtering** (PDF, Word, Excel, PPT, RTF).
    *   **Language Script Selection** (Simplified or Traditional Chinese).
    *   **Search in page titles only** (`intitle:`) and **Exact phrase matching** (`"phrase"`).

---

### 🎯 Use Cases

1.  **Chinese Market SEO Monitoring**: Track organic rankings, indexation status, and SERP visibility for keywords on Baidu.
2.  **Brand Protection & Infringement Tracking**: Search for unauthorized sellers, trademark violations, or fake brand representations on Chinese web properties.
3.  **Competitor Intelligence**: Analyze competitor landing pages, display domains, and search snippets rank for specific terms.
4.  **Academic & Sentiment Analysis Research**: Extract historical data, news snippets, and online discussions relevant to Chinese culture, business, or politics.

---

### 🛠️ How to Use

1.  **Configure Queries**: Enter one or more keywords/queries in the **Queries / Search Terms** input box (one per line).
2.  **Define Max Results**: Set the maximum number of results you want to retrieve per query (e.g., `100`).
3.  **Apply Filters**: (Optional) Restrict results by site/domain, publication date (time range), language script, or file type.
4.  **Enable URL Resolution**: Keep **Resolve real URLs** checked to follow redirects and get the actual target URLs instead of raw Baidu redirect links.
5.  **Configure Proxy**: For heavy usage, enable the **Apify Proxy** (using residential proxies is recommended to avoid IP bans).
6.  **Run the Actor**: Click the **Run** button. The scraper will collect the data and store it in your default dataset.

---

### 📥 Input Configuration

Here is a list of the available input parameters:

| Field Name | Type | Description | Default |
| :--- | :--- | :--- | :--- |
| `queries` | `array` | List of search queries to run sequentially (one per line). | `["claude anthropic"]` |
| `maxResults` | `integer` | Max results to collect for **each** query. | `100` |
| `timeRange` | `string` | Filter results by date: `any`, `day`, `week`, `month`, or `year`. | `"any"` |
| `sites` | `array` | Limit search to specific domains (e.g. `wikipedia.org`). | `[]` |
| `filetype` | `string` | Limit results to specific file types: `pdf`, `doc`, `xls`, `ppt`, `rtf`. | `"any"` |
| `language` | `string` | Chinese script: `any`, `simplified`, or `traditional`. | `"any"` |
| `exactPhrase` | `string` | Require results to contain this exact phrase. | `""` |
| `excludeWords` | `array` | Exclude results containing these words. | `[]` |
| `titleOnly` | `boolean` | Restrict search matches to page titles only. | `false` |
| `resolveRealUrls`| `boolean` | Follow Baidu redirect links to get the real target URL. | `true` |
| `proxyConfiguration` | `object` | Proxy settings (apify proxy, custom proxies). | `None` |

---

### 📤 Output Format

Each scraped search result item is stored as an object in the Apify dataset. The scraper outputs the following fields:

| Field | Type | Description |
| :--- | :--- | :--- |
| `query` | `string` | The search query term. |
| `position` | `integer` | 1-based ranking position of the result for this query. |
| `page` | `integer` | The page number on Baidu where the result was found. |
| `title` | `string` | The title of the search result page. |
| `url` | `string` | The resolved, final destination URL (e.g., `https://example.com/page`). |
| `baiduUrl` | `string` | The original Baidu redirect URL. |
| `displayUrl` | `string` | The display domain name shown on Baidu. |
| `snippet` | `string` | Description snippet text matching your search terms. |
| `date` | `string` | Publication date of the page (if shown on Baidu). |
| `siteName` | `string` | Displayed name of the website (if shown on Baidu). |

#### Output JSON Example

```json
{
  "query": "apple",
  "position": 1,
  "page": 1,
  "title": "Apple (中国大陆) - 官方网站",
  "url": "https://www.apple.com.cn/",
  "baiduUrl": "http://www.baidu.com/link?url=6lHipUPotM6NN3efDPvd4gZk1ZSQhtVwsIBdG3DGtmFUBe5LzfEdru89qaxDmtNy",
  "displayUrl": "www.apple.com.cn/",
  "snippet": "探索Apple 的创新世界,选购各式 iPhone、iPad、Apple Watch 和 Mac,浏览各类配件、娱乐产品,并获得相关产品的专家服务支持。",
  "date": null,
  "siteName": null
}
````

***

### 💡 Troubleshooting & Performance Tips

- **Speeding up Runs**: Setting `resolveRealUrls` to `false` makes the scraper significantly faster because it doesn't need to make HEAD/GET HTTP requests to every resolved target website. If you only need domain names or the raw Baidu redirect links, turn this off.

***

### ❓ FAQ

**Q: Can I scrape thousands of keywords?**\
A: Yes! You can input a large list of keywords in the `queries` field.

**Q: Why are some destination URLs identical to the Baidu redirect URLs?**\
A: If the target website is offline, slow to respond, or blocks redirect resolution requests, the scraper falls back to the original Baidu redirect link to ensure you do not lose data.

# Actor input Schema

## `queries` (type: `array`):

List of search queries to process in sequence. One query per line.

## `maxResults` (type: `integer`):

Maximum number of results to collect for each query.

## `timeRange` (type: `string`):

Restrict results by recency.

## `sites` (type: `array`):

Restrict results to these domains (operator site:). Multiple domains are combined with OR.

## `filetype` (type: `string`):

Restrict to a single file type.

## `language` (type: `string`):

Restrict to simplified or traditional Chinese.

## `exactPhrase` (type: `string`):

Wrapped in quotes; results must contain this exact phrase.

## `excludeWords` (type: `array`):

Each word is appended as -word to the query.

## `titleOnly` (type: `boolean`):

Restrict matching to page titles.

## `resolveRealUrls` (type: `boolean`):

Follow Baidu redirects to retrieve the real destination URL. Slower but produces usable URLs.

## Actor input object example

```json
{
  "queries": [
    "pple"
  ],
  "maxResults": 10,
  "timeRange": "any",
  "sites": [],
  "filetype": "any",
  "language": "any",
  "excludeWords": [],
  "titleOnly": false,
  "resolveRealUrls": true
}
```

# Actor output Schema

## `dataset` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "queries": [
        "pple"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("lofomachines/baidu-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "queries": ["pple"] }

# Run the Actor and wait for it to finish
run = client.actor("lofomachines/baidu-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "queries": [
    "pple"
  ]
}' |
apify call lofomachines/baidu-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=lofomachines/baidu-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Baidu Search Scraper",
        "description": "Scrapes Baidu search results with all major filters and pagination.",
        "version": "0.1",
        "x-build-id": "rQbb52cPiyiLSA2hW"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/lofomachines~baidu-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-lofomachines-baidu-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/lofomachines~baidu-scraper/runs": {
            "post": {
                "operationId": "runs-sync-lofomachines-baidu-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/lofomachines~baidu-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-lofomachines-baidu-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "queries": {
                        "title": "Queries / Search Terms",
                        "type": "array",
                        "description": "List of search queries to process in sequence. One query per line.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxResults": {
                        "title": "Max results per query",
                        "minimum": 1,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Maximum number of results to collect for each query.",
                        "default": 10
                    },
                    "timeRange": {
                        "title": "Time range",
                        "enum": [
                            "any",
                            "day",
                            "week",
                            "month",
                            "year"
                        ],
                        "type": "string",
                        "description": "Restrict results by recency.",
                        "default": "any"
                    },
                    "sites": {
                        "title": "Site filter (domains)",
                        "type": "array",
                        "description": "Restrict results to these domains (operator site:). Multiple domains are combined with OR.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "filetype": {
                        "title": "File type",
                        "enum": [
                            "any",
                            "pdf",
                            "doc",
                            "xls",
                            "ppt",
                            "rtf"
                        ],
                        "type": "string",
                        "description": "Restrict to a single file type.",
                        "default": "any"
                    },
                    "language": {
                        "title": "Chinese language variant",
                        "enum": [
                            "any",
                            "simplified",
                            "traditional"
                        ],
                        "type": "string",
                        "description": "Restrict to simplified or traditional Chinese.",
                        "default": "any"
                    },
                    "exactPhrase": {
                        "title": "Exact phrase",
                        "type": "string",
                        "description": "Wrapped in quotes; results must contain this exact phrase."
                    },
                    "excludeWords": {
                        "title": "Exclude words",
                        "type": "array",
                        "description": "Each word is appended as -word to the query.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "titleOnly": {
                        "title": "Search in titles only (intitle:)",
                        "type": "boolean",
                        "description": "Restrict matching to page titles.",
                        "default": false
                    },
                    "resolveRealUrls": {
                        "title": "Resolve real URLs",
                        "type": "boolean",
                        "description": "Follow Baidu redirects to retrieve the real destination URL. Slower but produces usable URLs.",
                        "default": true
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
