# Baidu Library Wenku Scraper - Low-cost 低成本💲🔥🇨🇳📚 (`delectable_incubator/baidu-library-wenku-scraper---low-cost-di-cheng-ben`) Actor

Scrape Baidu Wenku document results easily 🇨🇳📚 with a powerful SERP scraper.

Extract titles, document URLs, cover images, authors, page counts, ratings, and views for any keyword.

Ideal for document research, trend analysis, competitor intelligence, and Baidu Wenku tracking 📊🚀

- **URL**: https://apify.com/delectable\_incubator/baidu-library-wenku-scraper---low-cost-di-cheng-ben.md
- **Developed by:** [Prime Scrape](https://apify.com/delectable_incubator) (community)
- **Categories:** Lead generation, Developer tools, Automation
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $0.00005 / actor start

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

<p align="center"> <img src="https://i.ibb.co/jkNS73wX/readme.png" alt="Baidu Images Scraper - PrimeScrape" width="100%"> </p>

---

### Baidu Wenku Scraper 🇨🇳📚🔎 百度文库爬虫

The Baidu Wenku Scraper is a fast, scalable, and reliable Apify Actor designed to extract structured document search results directly from Baidu Wenku.

It is built for China market intelligence, academic research, document analysis, competitor benchmarking, industry reports collection, and SERP tracking of Baidu Wenku content.

---

### 百度文库爬虫 🧠📊🚀

该工具可通过批量关键词自动抓取百度文库搜索结果，并返回结构化的文档数据，包括作者、页数、评分、浏览量等信息。

非常适合研究人员、数据分析师、企业情报团队以及AI数据集构建使用。

---

### 🎯 What This Scraper Does

Simply provide a list of keywords and a limit per keyword — the scraper handles everything automatically.

✅ Scrapes Baidu Wenku search results at scale

✅ Supports bulk keyword input (multi-search mode)

✅ Extracts structured document metadata

✅ Handles scrolling & pagination automatically

✅ Stops at defined limits per keyword

✅ Moves seamlessly between keywords

✅ Clean, structured, analysis-ready output

---

### 📊 Data Extracted

🧾 Document Fields

| Field               | Description                             |
| ------------------- | --------------------------------------- |
| 🔎 `keyword`        | Search keyword used                     |
| 🔢 `position`       | Ranking position in Baidu Wenku results |
| 📄 `title`          | Document title                          |
| 🔗 `url`            | Direct document URL                     |
| 🖼 `coverImage`     | Document cover image                    |
| 📁 `fileType`       | File format (PDF, DOC, PPT, etc.)       |
| 🧾 `content`        | Preview text / description snippet      |
| 👤 `author`         | Document author/uploader                |
| 🖼️ `authorAvatar`  | Author profile image                    |
| ⭐ `rating`          | Document rating (if available)          |
| 📑 `pageCount`      | Number of pages                         |
| 👁 `viewCount`      | Total number of views                   |
| ⏱ `searchTimestamp` | Time of extraction                      |

---

### 🛠 How to Use

1️⃣ Configure Input

Provide one or multiple keywords:

````

{
"keywords": \[
"AI",
"网络",
"人工智能"
],
"MAX\_ITEMS\_PER\_KEYWORD": 50
}

```

2️⃣ Run the Actor

• Performs Baidu Wenku document search

• Extracts structured document results

• Collects metadata per keyword

• Stops automatically at limit

3️⃣ Export the Dataset

Download your results in multiple formats:

✅ JSON

✅ CSV

✅ Excel

✅ XML

✅ HTML


---

### ⚙️ Input Configuration

#### 📥 Input Example

```

{
"keywords": \["人工智能"],
"MAX\_ITEMS\_PER\_KEYWORD": 50
}

```



#### Input Fields

| Field                   | Type    | Description                                        |
| ----------------------- | ------- | -------------------------------------------------- |
| `keywords`              | array   | List of search keywords (bulk mode supported)      |
| `MAX_ITEMS_PER_KEYWORD` | integer | Maximum number of documents to extract per keyword |

---


### 📤 Output Example

```

{
"keyword": "人工智能",
"position": 1,
"title": "人工智能行业发展研究报告",
"url": "https://example.com/document",
"coverImage": "https://example.com/cover.jpg",
"fileType": "PDF",
"content": "本报告分析了人工智能行业的发展趋势...",
"author": "行业研究中心",
"authorAvatar": "https://example.com/avatar.jpg",
"rating": "4.8",
"pageCount": "32",
"viewCount": "12,530",
"searchTimestamp": "2026-02-15T10:21:34.000Z"
}

````

---

### 📊 Output explanation

| Field             | Description              |
| ----------------- | ------------------------ |
| `keyword`         | Search term used         |
| `position`        | Ranking in Wenku results |
| `title`           | Document title           |
| `url`             | Document URL             |
| `coverImage`      | Cover image              |
| `fileType`        | File type                |
| `content`         | Text preview             |
| `author`          | Uploader name            |
| `rating`          | Document rating          |
| `pageCount`       | Number of pages          |
| `viewCount`       | Total views              |
| `searchTimestamp` | Scraping time            |


---


### 🌍 Why Use This Scraper? 

📚 Academic Research — collect papers, reports, and studies

📊 Market Intelligence — analyze industry documents in China

🕵️ Competitor Analysis — track competitor whitepapers

🏷 Brand Monitoring — detect brand-related publications

🤖 AI Dataset Building — structured document datasets for NLP

🔄 Automation Ready — schedule recurring extraction

---

### ⚠️ Disclaimer

This tool is an independent solution and is not affiliated with, endorsed by, or sponsored by Baidu or its subsidiaries or partners.

---

### 💸 Pricing

This scraper runs on a **pay per events subscription model**.

You only pay for **successful runs**.

💳 **Price:** $9.98 / 1000 results

---

### Related Actors 

If you're interested in SEO scraping solutions, explore more tools:

(Coming soon)

---

### 📬 Support

⭐⭐⭐⭐⭐ Leave a 5-star rating if you like this tool

---

### 🌍 PrimeScrape

Built for scalable web data extraction & automation

Contact for custom scraping solutions or enterprise requests via Apify or by email.

# Actor input Schema

## `keywords` (type: `array`):

List of keywords or topics to search for documents. Each keyword will be scraped separately.

用于搜索文档的关键词或主题列表。每个关键词将分别进行抓取。
## `MAX_ITEMS_PER_KEYWORD` (type: `integer`):

Maximum number of documents to extract per keyword.

每个关键词要提取的文档最大数量。

## Actor input object example

```json
{
  "keywords": [
    "Web",
    "网络"
  ],
  "MAX_ITEMS_PER_KEYWORD": 20
}
````

# Actor output Schema

## `overview` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "keywords": [
        "Web",
        "网络"
    ],
    "MAX_ITEMS_PER_KEYWORD": 20
};

// Run the Actor and wait for it to finish
const run = await client.actor("delectable_incubator/baidu-library-wenku-scraper---low-cost-di-cheng-ben").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "keywords": [
        "Web",
        "网络",
    ],
    "MAX_ITEMS_PER_KEYWORD": 20,
}

# Run the Actor and wait for it to finish
run = client.actor("delectable_incubator/baidu-library-wenku-scraper---low-cost-di-cheng-ben").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "keywords": [
    "Web",
    "网络"
  ],
  "MAX_ITEMS_PER_KEYWORD": 20
}' |
apify call delectable_incubator/baidu-library-wenku-scraper---low-cost-di-cheng-ben --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=delectable_incubator/baidu-library-wenku-scraper---low-cost-di-cheng-ben",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Baidu Library Wenku Scraper - Low-cost 低成本💲🔥🇨🇳📚",
        "description": "Scrape Baidu Wenku document results easily 🇨🇳📚 with a powerful SERP scraper. \n\nExtract titles, document URLs, cover images, authors, page counts, ratings, and views for any keyword. \n\nIdeal for document research, trend analysis, competitor intelligence, and Baidu Wenku tracking 📊🚀",
        "version": "0.0",
        "x-build-id": "nig5i6m00sXgV93EQ"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/delectable_incubator~baidu-library-wenku-scraper---low-cost-di-cheng-ben/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-delectable_incubator-baidu-library-wenku-scraper---low-cost-di-cheng-ben",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/delectable_incubator~baidu-library-wenku-scraper---low-cost-di-cheng-ben/runs": {
            "post": {
                "operationId": "runs-sync-delectable_incubator-baidu-library-wenku-scraper---low-cost-di-cheng-ben",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/delectable_incubator~baidu-library-wenku-scraper---low-cost-di-cheng-ben/run-sync": {
            "post": {
                "operationId": "run-sync-delectable_incubator-baidu-library-wenku-scraper---low-cost-di-cheng-ben",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "keywords",
                    "MAX_ITEMS_PER_KEYWORD"
                ],
                "properties": {
                    "keywords": {
                        "title": "Keywords or topics 🔍📚 | 关键词或主题 🔍📚",
                        "type": "array",
                        "description": "List of keywords or topics to search for documents. Each keyword will be scraped separately.\n\n用于搜索文档的关键词或主题列表。每个关键词将分别进行抓取。",
                        "items": {
                            "type": "string"
                        },
                        "default": [
                            "Web",
                            "网络"
                        ]
                    },
                    "MAX_ITEMS_PER_KEYWORD": {
                        "title": "Max documents per keyword | 每个关键词的最大文档数",
                        "type": "integer",
                        "description": "Maximum number of documents to extract per keyword.\n\n每个关键词要提取的文档最大数量。",
                        "default": 20
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
