# 🔍 Baidu Search Scraper (`simpleapi/baidu-search-scraper`) Actor

Scrape Baidu search results at scale. Extract organic listings, answer boxes, related videos, related searches, and top searches. Supports bulk queries, proxy fallback, date filters, and device/language options for SEO and market research.

- **URL**: https://apify.com/simpleapi/baidu-search-scraper.md
- **Developed by:** [SimpleAPI](https://apify.com/simpleapi) (community)
- **Categories:** SEO tools, Developer tools, Other
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $2.99 / 1,000 results

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Baidu Search Scraper

Scrape Baidu search results at scale with intelligent proxy fallback. Extract organic results, answer boxes, related videos, people also search for, related searches, and top searches.

### Why Choose Us?

- **Intelligent proxy fallback**: Starts with no proxy; automatically falls back to datacenter then residential if Baidu blocks requests
- **Bulk support**: Process multiple search queries in a single run
- **Comprehensive extraction**: Organic results, answer boxes, videos, related searches, and more
- **Robust error handling**: Retries and fallbacks keep your runs successful

### Key Features

- **No proxy by default** – Saves cost when Baidu allows direct access
- **Automatic fallback** – Datacenter → Residential with 3 retries on block
- **Stick with residential** – Once fallback occurs, uses residential for all remaining requests
- **Detailed logging** – Clear proxy events and progress in Apify logs
- **Bulk URLs or terms** – Support for Baidu search URLs or plain search terms

### Input

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| urls | array | Yes | Baidu search URLs (e.g. `https://www.baidu.com/s?wd=python`) or plain search terms |
| proxyConfiguration | object | No | Proxy settings. Default: no proxy. Falls back automatically on block |
| maxPagination | integer | No | Max pages per query (0-10). Default: 3 |
| numResults | integer | No | Results per page (1-50). Default: 10 |
| startPage | integer | No | Starting page. Default: 1 |
| timePeriod | object | No | Optional date filter: `{startDate, endDate}` (YYYY-MM-DD) |

#### Example Input

```json
{
  "urls": [
    { "url": "https://www.baidu.com/s?wd=python" },
    { "url": "https://www.baidu.com/s?wd=Javascript" }
  ],
  "proxyConfiguration": { "useApifyProxy": false },
  "maxPagination": 3,
  "numResults": 10
}
````

### Output

```json
{
  "summary": {
    "total_queries": 2,
    "queries": ["python", "Javascript"],
    "total_organic_results": 200,
    "total_answer_boxes": 168,
    "total_related_videos": 0,
    "total_people_also_search_for": 180,
    "total_related_searches": 1,
    "total_top_searches": 1675
  },
  "results_by_query": {
    "python": {
      "query": "python",
      "organic_results": [...],
      "answer_box": [...],
      "related_videos": [...],
      "people_also_search_for": [...],
      "related_searches": [...],
      "top_searches": [...]
    }
  }
}
```

### How to Use the Actor (via Apify Console)

1. Log in at <https://console.apify.com> and go to **Actors**
2. Find **baidu-search-scraper** and click it
3. Configure inputs (URLs/terms, proxy toggle, max pages, etc.)
4. Run the actor
5. Monitor logs in real time
6. Access results in the **OUTPUT** tab
7. Export results to JSON or CSV

### Best Use Cases

- SEO research and competitor analysis
- Market research and trend monitoring
- Content discovery and topic research
- Academic research on search behavior

### Frequently Asked Questions

**Q: Does it work without a proxy?**\
A: Yes. By default it uses no proxy. If Baidu blocks, it automatically falls back to datacenter then residential proxies.

**Q: Can I use my own proxy?**\
A: Yes. Configure proxy in the input. The fallback still applies if your proxy gets blocked.

**Q: What if residential proxy fails?**\
A: It retries up to 3 times, then logs the error and continues with remaining queries.

### Support and Feedback

[Apify Store](https://apify.com/store) | [Apify Documentation](https://docs.apify.com)

### Cautions

- Data is collected only from **publicly available sources**
- No data from private accounts or password-protected content
- End users are responsible for legal compliance (privacy, data protection, etc.)

# Actor input Schema

## `urls` (type: `array`):

📝 **What to enter:** Baidu search URLs (e.g. https://www.baidu.com/s?wd=python) OR plain search terms (e.g. python tutorial, machine learning). Add one per line for bulk scraping.

💡 **Tip:** Use plain terms for simplicity. URLs are parsed automatically to extract the query.

## `deviceType` (type: `string`):

🖥️ **Desktop** = www.baidu.com (default). 📱 **Mobile/Tablet** = m.baidu.com. Use to scrape mobile vs desktop SERP (different layouts & results).

## `languageLocalization` (type: `integer`):

1 = All languages (default). 2 = Simplified Chinese (简体中文). 3 = Traditional Chinese (繁體中文). Affects search result language/region.

## `startPage` (type: `integer`):

Page number to start scraping from. 1 = first page (default). Use 2+ to skip earlier pages (e.g. for pagination continuation).

## `numResults` (type: `integer`):

Number of results per page (1–50). Baidu typically shows 10. Higher = more results per request. Use with maxPagination for total result cap.

## `timePeriod` (type: `object`):

Optional date range filter. Leave empty for no filter (all results). Use Start Date + End Date for custom range, OR Days Ago for "last N days" (e.g. 7 = last week).

## `maxPagination` (type: `integer`):

Max pages to scrape per query. 0 = no limit (capped at 10). 3 = first 3 pages. More pages = more results but longer run time.

## `outputFile` (type: `string`):

Optional custom key for key-value store. Results are always saved to Apify dataset. If set, also saves to KVS with this name for easy retrieval.

## `proxyConfiguration` (type: `object`):

By default: no proxy. If Baidu blocks → datacenter → residential (3 retries). Enable Apify proxy here if you want to start with proxy. Fallback still applies when blocked.

## Actor input object example

```json
{
  "urls": [
    "python tutorial"
  ],
  "deviceType": "desktop",
  "languageLocalization": 1,
  "startPage": 1,
  "numResults": 10,
  "maxPagination": 3,
  "outputFile": "",
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "urls": [
        "python tutorial"
    ],
    "proxyConfiguration": {
        "useApifyProxy": false
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("simpleapi/baidu-search-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "urls": ["python tutorial"],
    "proxyConfiguration": { "useApifyProxy": False },
}

# Run the Actor and wait for it to finish
run = client.actor("simpleapi/baidu-search-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "urls": [
    "python tutorial"
  ],
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}' |
apify call simpleapi/baidu-search-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=simpleapi/baidu-search-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "🔍 Baidu Search Scraper",
        "description": "Scrape Baidu search results at scale. Extract organic listings, answer boxes, related videos, related searches, and top searches. Supports bulk queries, proxy fallback, date filters, and device/language options for SEO and market research.",
        "version": "0.1",
        "x-build-id": "l9k1Z94UGzo7WYIMD"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/simpleapi~baidu-search-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-simpleapi-baidu-search-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/simpleapi~baidu-search-scraper/runs": {
            "post": {
                "operationId": "runs-sync-simpleapi-baidu-search-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/simpleapi~baidu-search-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-simpleapi-baidu-search-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "urls"
                ],
                "properties": {
                    "urls": {
                        "title": "🔎 Search Queries (required)",
                        "type": "array",
                        "description": "📝 **What to enter:** Baidu search URLs (e.g. https://www.baidu.com/s?wd=python) OR plain search terms (e.g. python tutorial, machine learning). Add one per line for bulk scraping.\n\n💡 **Tip:** Use plain terms for simplicity. URLs are parsed automatically to extract the query.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "deviceType": {
                        "title": "📱 Device Type",
                        "enum": [
                            "desktop",
                            "mobile",
                            "tablet"
                        ],
                        "type": "string",
                        "description": "🖥️ **Desktop** = www.baidu.com (default). 📱 **Mobile/Tablet** = m.baidu.com. Use to scrape mobile vs desktop SERP (different layouts & results).",
                        "default": "desktop"
                    },
                    "languageLocalization": {
                        "title": "🌐 Language",
                        "minimum": 1,
                        "maximum": 3,
                        "type": "integer",
                        "description": "1 = All languages (default). 2 = Simplified Chinese (简体中文). 3 = Traditional Chinese (繁體中文). Affects search result language/region.",
                        "default": 1
                    },
                    "startPage": {
                        "title": "📄 Starting Page",
                        "minimum": 1,
                        "type": "integer",
                        "description": "Page number to start scraping from. 1 = first page (default). Use 2+ to skip earlier pages (e.g. for pagination continuation).",
                        "default": 1
                    },
                    "numResults": {
                        "title": "📊 Results Per Page",
                        "minimum": 1,
                        "maximum": 50,
                        "type": "integer",
                        "description": "Number of results per page (1–50). Baidu typically shows 10. Higher = more results per request. Use with maxPagination for total result cap.",
                        "default": 10
                    },
                    "timePeriod": {
                        "title": "📅 Time Period Filter",
                        "type": "object",
                        "description": "Optional date range filter. Leave empty for no filter (all results). Use Start Date + End Date for custom range, OR Days Ago for \"last N days\" (e.g. 7 = last week).",
                        "properties": {
                            "startDate": {
                                "title": "📆 Start Date",
                                "type": "string",
                                "editor": "datepicker",
                                "description": "From date (YYYY-MM-DD). Used with End Date for custom range. Ignored if Days Ago is set.",
                                "default": ""
                            },
                            "endDate": {
                                "title": "📆 End Date",
                                "type": "string",
                                "editor": "datepicker",
                                "description": "To date (YYYY-MM-DD). Used with Start Date for custom range. Ignored if Days Ago is set.",
                                "default": ""
                            },
                            "daysAgo": {
                                "title": "⏱️ Days Ago",
                                "type": "integer",
                                "description": "Alternative: filter to last N days. e.g. 7 = last week, 30 = last month. Set to 0 to disable (use Start/End dates or no filter).",
                                "minimum": 0,
                                "default": 0
                            }
                        }
                    },
                    "maxPagination": {
                        "title": "📑 Maximum Pages",
                        "minimum": 0,
                        "maximum": 10,
                        "type": "integer",
                        "description": "Max pages to scrape per query. 0 = no limit (capped at 10). 3 = first 3 pages. More pages = more results but longer run time.",
                        "default": 3
                    },
                    "outputFile": {
                        "title": "💾 Output File",
                        "type": "string",
                        "description": "Optional custom key for key-value store. Results are always saved to Apify dataset. If set, also saves to KVS with this name for easy retrieval.",
                        "default": ""
                    },
                    "proxyConfiguration": {
                        "title": "🔒 Proxy Configuration",
                        "type": "object",
                        "description": "By default: no proxy. If Baidu blocks → datacenter → residential (3 retries). Enable Apify proxy here if you want to start with proxy. Fallback still applies when blocked."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
