# Baidu News Scraper - Low-cost 低成本💲🔥🇨🇳📰 (`delectable_incubator/baidu-news-scraper---low-cost-di-cheng-ben`) Actor

Scrape Baidu News search results easily 🇨🇳📰 with a powerful news scraper.

Extract article URLs, titles, snippets, sources, publication dates, and thumbnails for any keyword.

Ideal for media monitoring, brand tracking, and Baidu News SERP analysis with structured datasets 📊🚀

- **URL**: https://apify.com/delectable\_incubator/baidu-news-scraper---low-cost-di-cheng-ben.md
- **Developed by:** [Prime Scrape](https://apify.com/delectable_incubator) (community)
- **Categories:** News, SEO tools, Lead generation
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $0.00005 / actor start

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

<p align="center"> <img src="https://i.ibb.co/jkNS73wX/readme.png" alt="Baidu News Scraper - PrimeScrape" width="100%"> </p>

---

### Baidu News Scraper 🇨🇳📰🔎 百度新闻爬虫

The Baidu News Scraper is a fast, scalable, and reliable Apify Actor designed to extract structured news search results directly from Baidu News.

It is built for media monitoring, brand reputation tracking, PR analysis, competitor intelligence, and China-focused news SERP tracking.

---

### 百度新闻爬虫 🧠📊🚀

该工具可通过批量关键词自动抓取百度新闻搜索结果，并返回结构化的新闻数据，包括来源、发布时间、摘要等信息。

非常适合媒体监测、品牌舆情分析、市场研究以及AI数据集构建使用。

---

### 🎯 What This Scraper Does

Simply provide a list of keywords and a limit per keyword — the scraper handles everything automatically.

✅ Scrapes Baidu News search results at scale

✅ Supports bulk keyword input (multi-search mode)

✅ Extracts structured news article metadata

✅ Handles pagination automatically

✅ Stops at defined limits per keyword

✅ Moves seamlessly between keywords

✅ Clean, structured, analysis-ready output

---

### 📊 Data Extracted

🧾 News Fields

| Field               | Description                     |
| ------------------- | ------------------------------- |
| 🔎 `keyword`        | Search keyword used             |
| 🔢 `position`       | Ranking position in Baidu News  |
| 📰 `title`          | News article title              |
| 🔗 `url`            | Direct article URL              |
| 🧾 `snippet`        | Article summary / description   |
| 🏢 `source`         | News publisher name             |
| 🌐 `sourceUrl`      | Publisher website URL           |
| 📅 `publishedDate`  | Publication date (if available) |
| 🖼 `thumbnail`      | Article image (if available)    |
| ⏱ `searchTimestamp` | Time of extraction              |


---

### 🛠 How to Use

1️⃣ Configure Input

Provide one or multiple keywords:

````

{
"keywords": \[
"互联网",
"AI",
"中国科技"
],
"MAX\_ITEMS\_PER\_KEYWORD": 50
}

```

2️⃣ Run the Actor

• Performs Baidu News search

• Extracts structured news results

• Collects metadata per keyword

• Stops automatically at limit

3️⃣ Export the Dataset

Download your results in multiple formats:

✅ JSON

✅ CSV

✅ Excel

✅ XML

✅ HTML


---

### ⚙️ Input Configuration

#### 📥 Input Example

```

{
"keywords": \["人工智能"],
"MAX\_ITEMS\_PER\_KEYWORD": 50
}

```



#### Input Fields

| Field                   | Type    | Description                                   |
| ----------------------- | ------- | --------------------------------------------- |
| `keywords`              | array   | List of search keywords (bulk mode supported) |
| `MAX_ITEMS_PER_KEYWORD` | integer | Maximum number of news articles per keyword   |

---


### 📤 Output Example

```

{
"keyword": "人工智能",
"position": 1,
"title": "人工智能行业迎来新突破",
"url": "https://example.com/news-article",
"snippet": "人工智能技术正在加速发展...",
"source": "新华社",
"sourceUrl": "https://example.com",
"publishedDate": "2026-02-13",
"thumbnail": "https://example.com/image.jpg",
"searchTimestamp": "2026-02-13T14:32:21.000Z"
}

````

---

### 📊 Output explanation

| Field             | Description             |
| ----------------- | ----------------------- |
| `keyword`         | Search term used        |
| `position`        | Ranking in news results |
| `title`           | Article title           |
| `url`             | Article URL             |
| `snippet`         | Article summary         |
| `source`          | News publisher          |
| `sourceUrl`       | Publisher website       |
| `publishedDate`   | Publication date        |
| `thumbnail`       | Article image           |
| `searchTimestamp` | Scraping timestamp      |


---


### 🌍 Why Use This Scraper? 

📈 Media Monitoring — track news coverage in China

🕵️ Competitor Intelligence — monitor competitors in Chinese media

🏷 Brand Reputation — analyze visibility and sentiment

📰 PR Analysis — measure impact of campaigns

🤖 Data Collection — structured datasets for analytics & AI

🔄 Automation Ready — schedule recurring scraping runs

📊 SERP Intelligence — understand news ranking behavior

---

### ⚠️ Disclaimer

This tool is an independent solution and is not affiliated with, endorsed by, or sponsored by Baidu or its subsidiaries or partners.

---

### 💸 Pricing

This scraper runs on a **pay per events subscription model**.

You only pay for **successful runs**.

💳 **Price:** $9.98 / 1000 results

---

### Related Actors 

If you're interested in SEO scraping solutions, explore more tools:

(Coming soon)

---

### 📬 Support

⭐⭐⭐⭐⭐ Leave a 5-star rating if you like this tool

---

### 🌍 PrimeScrape

Built for scalable web data extraction & automation

Contact for custom scraping solutions or enterprise requests via Apify or by email.

# Actor input Schema

## `keywords` (type: `array`):

List of keywords or topics to search for. Each keyword will be scraped separately.

要搜索的关键词或主题列表。每个关键词将分别进行抓取。
## `MAX_ITEMS_PER_KEYWORD` (type: `integer`):

Maximum number of news articles to extract per keyword.

每个关键词要提取的新闻文章最大数量。

## Actor input object example

```json
{
  "keywords": [
    "网络",
    "Web"
  ],
  "MAX_ITEMS_PER_KEYWORD": 50
}
````

# Actor output Schema

## `overview` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "keywords": [
        "网络",
        "Web"
    ],
    "MAX_ITEMS_PER_KEYWORD": 50
};

// Run the Actor and wait for it to finish
const run = await client.actor("delectable_incubator/baidu-news-scraper---low-cost-di-cheng-ben").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "keywords": [
        "网络",
        "Web",
    ],
    "MAX_ITEMS_PER_KEYWORD": 50,
}

# Run the Actor and wait for it to finish
run = client.actor("delectable_incubator/baidu-news-scraper---low-cost-di-cheng-ben").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "keywords": [
    "网络",
    "Web"
  ],
  "MAX_ITEMS_PER_KEYWORD": 50
}' |
apify call delectable_incubator/baidu-news-scraper---low-cost-di-cheng-ben --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=delectable_incubator/baidu-news-scraper---low-cost-di-cheng-ben",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Baidu News Scraper - Low-cost 低成本💲🔥🇨🇳📰",
        "description": "Scrape Baidu News search results easily 🇨🇳📰 with a powerful news scraper. \n\nExtract article URLs, titles, snippets, sources, publication dates, and thumbnails for any keyword. \n\nIdeal for media monitoring, brand tracking, and Baidu News SERP analysis with structured datasets 📊🚀",
        "version": "0.0",
        "x-build-id": "2l3kJwSH8l3xq1vuZ"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/delectable_incubator~baidu-news-scraper---low-cost-di-cheng-ben/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-delectable_incubator-baidu-news-scraper---low-cost-di-cheng-ben",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/delectable_incubator~baidu-news-scraper---low-cost-di-cheng-ben/runs": {
            "post": {
                "operationId": "runs-sync-delectable_incubator-baidu-news-scraper---low-cost-di-cheng-ben",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/delectable_incubator~baidu-news-scraper---low-cost-di-cheng-ben/run-sync": {
            "post": {
                "operationId": "run-sync-delectable_incubator-baidu-news-scraper---low-cost-di-cheng-ben",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "keywords",
                    "MAX_ITEMS_PER_KEYWORD"
                ],
                "properties": {
                    "keywords": {
                        "title": "Keywords or topics 🔍📰 | 关键词或主题 🔍📰",
                        "type": "array",
                        "description": "List of keywords or topics to search for. Each keyword will be scraped separately.\n\n要搜索的关键词或主题列表。每个关键词将分别进行抓取。",
                        "items": {
                            "type": "string"
                        },
                        "default": [
                            "网络",
                            "Web"
                        ]
                    },
                    "MAX_ITEMS_PER_KEYWORD": {
                        "title": "Max items per keyword | 每个关键词的最大条数",
                        "type": "integer",
                        "description": "Maximum number of news articles to extract per keyword.\n\n每个关键词要提取的新闻文章最大数量。",
                        "default": 50
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
