# Baidu Search Scraper (`jungle_synthesizer/baidu-search-scraper`) Actor

Scrape Baidu (百度) search results including web, news, and Baike entries. Supports multiple queries per run with pagination, SERP feature extraction, and related query harvesting. Ideal for Chinese-market SEO research, brand monitoring, and AI training data collection.

- **URL**: https://apify.com/jungle\_synthesizer/baidu-search-scraper.md
- **Developed by:** [BowTiedRaccoon](https://apify.com/jungle_synthesizer) (community)
- **Categories:** SEO tools, AI
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per event

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Baidu Search Scraper

Scrape Baidu (百度) search engine results for any search query. Extract organic results, news articles, Baike entries, and SERP metadata including related queries, total result estimates, and detected SERP features.

Baidu is China's dominant search engine with roughly 70% domestic market share. This scraper is the equivalent of a Google Search Scraper for the Chinese-language web — essential for any team doing SEO research, competitive intelligence, or data collection targeting the Chinese market.

### What this scraper collects

For each search result, the scraper extracts:

- **Title** — the result headline as shown on Baidu
- **URL** — the real destination URL (resolved from Baidu's redirect wrapper)
- **Displayed URL** — the shortened URL displayed on the SERP
- **Snippet** — the descriptive text shown below the title
- **Result type** — `organic`, `news`, `baike`, `video`
- **Source site** — hostname of the result
- **Published date** — date shown for news results
- **Thumbnail URL** — image thumbnail when present
- **Is Baidu-owned** — flags results pointing to Baidu properties (Baike, Zhidao, Tieba, etc.)
- **Related queries** — the "related searches" shown on the SERP page
- **Total results estimate** — Baidu's stated result count for the query
- **SERP features** — detected features: `answer_box`, `knowledge_panel`, `image_pack`, `video_pack`, `news_box`

### Supported search modes

| Mode | URL pattern | Use case |
|------|------------|----------|
| `web` | `www.baidu.com/s?wd=...` | Standard web search results |
| `news` | `news.baidu.com/ns?word=...` | News articles from Chinese media |

### Usage

Configure one or more search queries and set the search mode:

```json
{
  "queries": ["人工智能", "machine learning", "python编程"],
  "searchType": "web",
  "maxPages": 3,
  "maxItems": 100
}
````

#### Input parameters

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `queries` | array | required | Search query strings (Chinese or English) |
| `searchType` | string | `web` | Search mode: `web` or `news` |
| `resultsPerPage` | integer | `10` | Results per page (10 or 50) |
| `maxPages` | integer | `3` | Max SERP pages per query |
| `maxItems` | integer | `15` | Maximum total results across all queries |

### Performance and anti-bot notes

Baidu's WAF blocks all data center IP addresses. This scraper uses residential proxies to bypass the block and deliver real search results. Pacing is configured to avoid triggering rate-limit responses.

Politically sensitive queries on Baidu may return reduced result counts or redirected results — this is inherent to Baidu's content policies, not a scraper defect.

### Use cases

- **Chinese SEO research** — track keyword rankings and SERP features on Baidu
- **Brand monitoring** — monitor how a brand appears in Chinese search results
- **Competitive intelligence** — analyze which Chinese sites rank for industry keywords
- **AI training data** — collect Baidu SERPs as retrieval anchors for Chinese-language models
- **Academic research** — study information retrieval and content availability on the Chinese web

# Actor input Schema

## `sp_intended_usage` (type: `string`):

Please describe how you plan to use the data extracted by this crawler.

## `sp_improvement_suggestions` (type: `string`):

Provide any feedback or suggestions for improvements.

## `sp_contact` (type: `string`):

Provide your email address so we can get in touch with you.

## `queries` (type: `array`):

List of search queries to run on Baidu. Each query returns up to resultsPerPage x maxPages results.

## `searchType` (type: `string`):

Type of Baidu search: web (standard SERP) or news (news.baidu.com)

## `resultsPerPage` (type: `integer`):

Number of results per page (10 or 50 recommended)

## `maxPages` (type: `integer`):

Maximum number of SERP pages to fetch per query. Each page contains resultsPerPage results.

## `maxItems` (type: `integer`):

Maximum total number of result records to collect across all queries and pages.

## Actor input object example

```json
{
  "sp_intended_usage": "Describe your intended use...",
  "sp_improvement_suggestions": "Share your suggestions here...",
  "sp_contact": "Share your email here...",
  "queries": [
    "python programming",
    "machine learning"
  ],
  "searchType": "web",
  "resultsPerPage": 10,
  "maxPages": 3,
  "maxItems": 15
}
```

# Actor output Schema

## `results` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "sp_intended_usage": "Describe your intended use...",
    "sp_improvement_suggestions": "Share your suggestions here...",
    "sp_contact": "Share your email here...",
    "queries": [
        "python programming",
        "machine learning"
    ],
    "searchType": "web",
    "resultsPerPage": 10,
    "maxPages": 3,
    "maxItems": 15
};

// Run the Actor and wait for it to finish
const run = await client.actor("jungle_synthesizer/baidu-search-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "sp_intended_usage": "Describe your intended use...",
    "sp_improvement_suggestions": "Share your suggestions here...",
    "sp_contact": "Share your email here...",
    "queries": [
        "python programming",
        "machine learning",
    ],
    "searchType": "web",
    "resultsPerPage": 10,
    "maxPages": 3,
    "maxItems": 15,
}

# Run the Actor and wait for it to finish
run = client.actor("jungle_synthesizer/baidu-search-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "sp_intended_usage": "Describe your intended use...",
  "sp_improvement_suggestions": "Share your suggestions here...",
  "sp_contact": "Share your email here...",
  "queries": [
    "python programming",
    "machine learning"
  ],
  "searchType": "web",
  "resultsPerPage": 10,
  "maxPages": 3,
  "maxItems": 15
}' |
apify call jungle_synthesizer/baidu-search-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=jungle_synthesizer/baidu-search-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Baidu Search Scraper",
        "description": "Scrape Baidu (百度) search results including web, news, and Baike entries. Supports multiple queries per run with pagination, SERP feature extraction, and related query harvesting. Ideal for Chinese-market SEO research, brand monitoring, and AI training data collection.",
        "version": "0.1",
        "x-build-id": "elAPIg5zJQtK6YH4N"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/jungle_synthesizer~baidu-search-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-jungle_synthesizer-baidu-search-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/jungle_synthesizer~baidu-search-scraper/runs": {
            "post": {
                "operationId": "runs-sync-jungle_synthesizer-baidu-search-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/jungle_synthesizer~baidu-search-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-jungle_synthesizer-baidu-search-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "queries"
                ],
                "properties": {
                    "sp_intended_usage": {
                        "title": "What is the intended usage of this data?",
                        "minLength": 1,
                        "type": "string",
                        "description": "Please describe how you plan to use the data extracted by this crawler."
                    },
                    "sp_improvement_suggestions": {
                        "title": "How can we improve this crawler for you?",
                        "minLength": 1,
                        "type": "string",
                        "description": "Provide any feedback or suggestions for improvements."
                    },
                    "sp_contact": {
                        "title": "Contact Email",
                        "minLength": 1,
                        "type": "string",
                        "description": "Provide your email address so we can get in touch with you."
                    },
                    "queries": {
                        "title": "Search Queries",
                        "type": "array",
                        "description": "List of search queries to run on Baidu. Each query returns up to resultsPerPage x maxPages results.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "searchType": {
                        "title": "Search Type",
                        "enum": [
                            "web",
                            "news"
                        ],
                        "type": "string",
                        "description": "Type of Baidu search: web (standard SERP) or news (news.baidu.com)",
                        "default": "web"
                    },
                    "resultsPerPage": {
                        "title": "Results Per Page",
                        "minimum": 10,
                        "maximum": 50,
                        "type": "integer",
                        "description": "Number of results per page (10 or 50 recommended)",
                        "default": 10
                    },
                    "maxPages": {
                        "title": "Max Pages Per Query",
                        "minimum": 1,
                        "maximum": 10,
                        "type": "integer",
                        "description": "Maximum number of SERP pages to fetch per query. Each page contains resultsPerPage results.",
                        "default": 3
                    },
                    "maxItems": {
                        "title": "Max Items",
                        "type": "integer",
                        "description": "Maximum total number of result records to collect across all queries and pages.",
                        "default": 15
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
