# Zhihu Scraper — Q\&A, Answers, Articles, Columns (`sian.agency/zhihu-scraper`) Actor

Zhihu scraper — extract long-form Mandarin Q\&A, expert answers, articles & column posts. Keyword search, question answer threads, article detail, column article list. China market research, LLM training data, competitive intel. Four operations, one clean dataset per run. No API key.

- **URL**: https://apify.com/sian.agency/zhihu-scraper.md
- **Developed by:** [SIÁN OÜ](https://apify.com/sian.agency) (community)
- **Categories:** Social media, AI
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 1 bookmarks
- **User rating**: No ratings yet

## Pricing

from $2.00 / 1,000 search results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Zhihu Scraper — Q&A, Answers, Articles & Columns 🚀

[![SIÁN Agency Store](https://img.shields.io/badge/Store-SI%C3%81N%20Agency-1AE392)](https://apify.com/sian.agency?fpr=sian) [![SIÁN Weibo](https://img.shields.io/badge/SI%C3%81N-Weibo-E6162D)](https://apify.com/sian.agency/weibo-scraper?fpr=sian) [![SIÁN Xiaohongshu RedNote](https://img.shields.io/badge/SI%C3%81N-Xiaohongshu%20RedNote-FF2442)](https://apify.com/sian.agency/xiaohongshu-rednote-scraper?fpr=sian) [![SIÁN Taobao & Tmall](https://img.shields.io/badge/SI%C3%81N-Taobao%20%26%20Tmall-FF4F00)](https://apify.com/sian.agency/taobao-tmall-product-scraper?fpr=sian)

#### 🎉 The richest Mandarin Q&A corpus on the web — full HTML answer bodies, expert credentials, vote signals
##### Built for AI/LLM training teams, China market researchers, and B2B KOL outreach

---

### 📋 Overview

**Zhihu (知乎) is China's expert-driven Q&A platform** — the closest thing to a Mandarin Stack Overflow + Quora + Medium rolled into one. This scraper pulls **complete answer threads, full-HTML articles, keyword search across the platform, and column (zhuanlan) post lists** — clean, structured, ready for analysis or model training.

**Why AI teams, market researchers, and agencies choose us:**
- 🧠 **Best-in-class LLM training data** — full HTML answer bodies (not snippets) with author credentials, vote/comment signals, and badge verification — gold-standard SFT/RAG corpus material for Chinese-language models
- 📚 **Long-form depth, not shallow snippets** — competitors return excerpts; we return the entire answer + article body including embedded images, headings, and inline references
- 🔀 **Mixed-type search in one call** — keyword searches return answers, questions, articles, AND people in a single dataset, each row dispatched to the correct ID schema (`answerId`, `questionId`, `articleId`, `peopleId`)
- 🎖️ **KOL discovery built in** — every row carries `authorId`, `authorName`, `authorHeadline`, `authorFollowerCount`, `authorVoteupCount`, `authorBadges[]`, `authorIsOrg` — ready to dedupe and shortlist Zhihu blue/gold-badge experts
- 💰 **Pay-per-result pricing** — $0.004/search row, $0.040/article detail. Generous FREE tier. No subscription, no minimums, no surprise bills
- ✨ **No account, no API key, no proxy setup** — paste an ID or keyword, click run, get clean JSON

---

### ✨ Features

- 🔍 **Keyword Search** — search across all Zhihu content types in one call, ~20 mixed results per page
- 💬 **Question Answer Threads** — pull every answer to a Zhihu question with full HTML body, vote counts, and reply counts
- 📰 **Article Detail Extraction** — full article HTML body, author profile, topic tags, and parent column reference in a single row
- 📚 **Column (Zhuanlan) Article Lists** — paginate the complete catalog of any Zhihu column, ~10 articles per page
- 🏷️ **Author + Badge Data on Every Row** — Zhihu blue/gold badges, follower counts, vote tallies, headline bios baked in
- 🆔 **18–19-digit ID Precision** — IDs preserved as strings (no JavaScript bigint silent truncation)
- 🖼️ **Image URL Normalization** — all Zhihu CDN URLs upgraded to HTTPS automatically
- 📊 **Clean Structured JSON** — flat camelCase aliases on every entity, ready for BigQuery, Pinecone, pandas, or Airtable
- 🌐 **Mandarin-Aware Error Translation** — upstream Chinese error strings (`问题不存在`, `专栏不存在`) translated to plain English in the dataset
- ⚡ **Resilient Pagination** — built-in retry on transient upstream errors, no manual cursor management

---

### 🎬 Quick Start

Pick one of four operations, drop in a keyword or ID, and run. One operation per run, one clean dataset out.

```bash
curl -X POST "https://api.apify.com/v2/acts/sian.agency~zhihu-scraper/runs?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"operation":"search","keyword":"人工智能","maxPages":3}'
````

***

### 🚀 Getting Started (3 Simple Steps)

#### Step 1: Pick your operation

Choose one: `search` (keyword), `answerList` (question thread), `articleDetail` (single article), or `columnArticleList` (column posts).

#### Step 2: Provide the input

A keyword for search, or a Zhihu ID (questionId, articleId, columnId) for the targeted operations.

#### Step 3: Click Run

The actor handles pagination, retries, and ID precision automatically. Results land in the Apify dataset as flat JSON.

**That's it! In under a minute, you'll have:**

- Clean, flat JSON rows with the right ID/URL schema per type
- Full HTML content bodies (not snippets) for answers and articles
- Author + badge metadata on every row for KOL workflows

***

### 📥 Input Configuration

| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `operation` | enum | Yes | One of: `search`, `answerList`, `articleDetail`, `columnArticleList` |
| `keyword` | string | If `search` | Search term (Chinese or English) |
| `questionId` | string | If `answerList` | Zhihu question ID (numeric string) |
| `articleId` | string | If `articleDetail` | Zhihu article ID (numeric string) |
| `columnId` | string | If `columnArticleList` | Zhihu column slug (e.g. `xuehy`) |
| `maxPages` | number | No | Pagination cap (default 1; ignored for `articleDetail`) |

**Example — Keyword Search:**

```json
{
  "operation": "search",
  "keyword": "人工智能",
  "maxPages": 5
}
```

**Example — Question Answer Thread:**

```json
{
  "operation": "answerList",
  "questionId": "660962845",
  "maxPages": 10
}
```

**Example — Article Detail:**

```json
{
  "operation": "articleDetail",
  "articleId": "2032860336215307118"
}
```

**Example — Column Article List:**

```json
{
  "operation": "columnArticleList",
  "columnId": "xuehy",
  "maxPages": 5
}
```

***

### 📤 Output

Results are saved to the Apify dataset with **40+ fields** including full HTML bodies, author profiles, and engagement metrics.

| Field | Type | Description |
|-------|------|-------------|
| `operation` | string | Which operation produced the row |
| `entityType` | string | `answer` / `question` / `article` / `people` / `column-article` |
| `answerId` / `questionId` / `articleId` | string | Type-appropriate Zhihu entity ID (18–19 digits, preserved as string) |
| `title` | string | Question / article title |
| `excerpt` | string | Short summary text |
| `content` | string | **Full HTML body** for answers and articles |
| `voteupCount` | number | Upvote count |
| `commentCount` | number | Comment count |
| `authorId` | string | Author's numeric ID |
| `authorName` | string | Display name |
| `authorHeadline` | string | One-line bio |
| `authorFollowerCount` | number | Author follower count |
| `authorVoteupCount` | number | Lifetime upvotes received by author |
| `authorBadges` | array | Verified-expert badges (blue/gold) |
| `authorIsOrg` | boolean | Whether the author is a verified organization |
| `itemPageUrl` | string | Canonical Zhihu URL for the entity |
| `createdTime` / `updatedTime` | number | Unix timestamps |
| `topics` | array | Topic tags (article ops) |
| `column` | object | Parent column reference (article ops) |

**Example row (search result, `entityType: "answer"`):**

```json
{
  "operation": "search",
  "entityType": "answer",
  "answerId": "3654812345678901234",
  "questionId": "660962845",
  "title": "未来 10 年人工智能会让哪些行业彻底消失？",
  "excerpt": "从我的实际经验来看，AI 替代的不是行业，而是行业里...",
  "content": "<p>从我的实际经验来看...</p><img src=\"https://pic1.zhimg.com/...\">",
  "voteupCount": 1842,
  "commentCount": 327,
  "authorId": "abc-123-def",
  "authorName": "张三",
  "authorHeadline": "AI Researcher | Tsinghua University",
  "authorFollowerCount": 124300,
  "authorVoteupCount": 982401,
  "authorBadges": ["identity_blue"],
  "itemPageUrl": "https://www.zhihu.com/question/660962845/answer/3654812345678901234"
}
```

***

### 💼 Use Cases & Examples

#### 1. AI / LLM Training Corpus Building

**Wei, ML Engineer at a Beijing AI lab** pulls 100K+ Mandarin answer threads per month for SFT and RAG fine-tuning datasets.

**Input:** A list of question IDs covering broad topics (technology, finance, medicine, philosophy).
**Output:** Full HTML answer bodies with author credentials and vote signals for quality filtering.
**Use:** Bootstrap a domain-balanced Chinese-language instruction-tuning dataset. Filter by `voteupCount > 100` and `authorBadges` to keep high-signal answers only.

#### 2. China Market & Consumer Research

**Lin, Insights Lead at a Shanghai research agency** keyword-tracks branded questions weekly to surface unfiltered consumer sentiment.

**Input:** Brand or product keyword (`"特斯拉"`, `"iPhone 17"`, `"小米汽车"`).
**Output:** Top-voted questions and answers mentioning the brand, with vote/comment counts.
**Use:** Build a weekly brand-perception report grounded in real Chinese consumer language — not survey-mediated.

#### 3. Competitive Intelligence & Brand Monitoring

**Anya, PM at a B2B SaaS company** monitors competitor mentions in Q\&A threads to catch comparison content early.

**Input:** Competitor names + product category keywords.
**Output:** Questions, answers, and articles mentioning competitors, sorted by recency and engagement.
**Use:** Surface "X vs. Y" threads before they go viral; respond proactively where buyers are asking real questions.

#### 4. B2B Influencer / KOL Outreach

**Marcus, Marketing Lead at a B2B firm targeting China** shortlists Zhihu KOLs for sponsored long-form content.

**Input:** Topic keyword (`"AI 创业"`, `"SaaS 出海"`).
**Output:** Top-voted answers with author follower counts, badge verification, and headline bios.
**Use:** Dedupe authors across thousands of answers, sort by `authorFollowerCount` and badge level, hand off to outreach.

#### 5. Trend & Topic Early-Signal Detection

**Chen, Data Scientist at a hedge fund** runs daily keyword searches to spot emerging questions before mainstream pickup.

**Input:** Industry watchlist (semiconductors, energy, biotech) refreshed daily.
**Output:** New questions and rising answers, time-stamped with engagement velocity signals.
**Use:** Feed into an alpha-generation pipeline that flags breakout topics for analyst review.

#### 6. Academic & Sociolinguistic Research

**Dr. Park, Stanford computational linguist** builds Mandarin discourse corpora for academic NLP research.

**Input:** Topic clusters via keyword search and column article lists.
**Output:** Full HTML article bodies and answer threads with author demographics where available.
**Use:** Train discourse-level classifiers, study Chinese internet argumentation patterns, publish reproducible datasets.

***

### 🔗 Integration Examples

#### JavaScript/Node.js

```javascript
import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_TOKEN' });

const run = await client.actor('sian.agency/zhihu-scraper').call({
  operation: 'search',
  keyword: '人工智能',
  maxPages: 5,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items[0]);
```

#### Python

```python
from apify_client import ApifyClient
client = ApifyClient('YOUR_TOKEN')

run = client.actor('sian.agency/zhihu-scraper').call(
    run_input={
        'operation': 'answerList',
        'questionId': '660962845',
        'maxPages': 10,
    }
)

for item in client.dataset(run['defaultDatasetId']).iterate_items():
    print(item['authorName'], item['voteupCount'])
```

#### cURL

```bash
curl -X POST "https://api.apify.com/v2/acts/sian.agency~zhihu-scraper/runs?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"operation":"articleDetail","articleId":"2032860336215307118"}'
```

#### Automation Workflows (N8N / Zapier / Make)

1. **Trigger**: Daily schedule or webhook from your watchlist tool
2. **HTTP Request**: Call the actor with a keyword or column ID
3. **Process**: Filter rows by `voteupCount` / `authorBadges` / `authorFollowerCount`
4. **Action**: Push to BigQuery, Pinecone, Airtable, or trigger a Slack alert

***

### 📊 Performance & Pricing

#### FREE Tier (Try It Now)

- Full feature access on all four operations — same data quality as PAID
- Generous evaluation allowance under the Apify FREE plan
- No credit card required

#### PAID Tier (Production Ready)

- Pay-per-result: only charged for successful rows
- Volume discounts auto-applied at SILVER, GOLD, PLATINUM, DIAMOND tiers
- No subscription, no minimums, no commitments

**Live BRONZE per-result pricing:**

| Event | Price | Triggered by |
|---|---|---|
| Actor Start | $0.014 | Once per run |
| **Search Result** | **$0.004** (PRIMARY) | Per row from keyword search |
| Question Answer | $0.005 | Per answer in a question thread |
| Article Detail | $0.040 | Per article (full HTML body) |
| Column Article | $0.004 | Per article in a column listing |

💰 **Best price on the market for full-HTML Zhihu extraction** — competitors charge 3–5× more for snippet-only output.

🔗 [View current pricing](https://apify.com/sian.agency/zhihu-scraper?fpr=sian)

***

### ❓ Frequently Asked Questions

**Q: How many results can I pull per run?**
A: There's no hard cap — set `maxPages` to whatever you need. The actor handles pagination and retries automatically.

**Q: Do I need a Zhihu account or API key?**
A: No. Just an Apify account. We handle everything upstream.

**Q: Does it support private answers or paid-content articles?**
A: No — only publicly accessible content. Paywalled "盐选" articles return excerpt-only content per Zhihu's public surface.

**Q: What output formats are available?**
A: JSON, CSV, Excel, XML, JSONL — export directly from the Apify dataset UI or API.

**Q: How accurate are the 18–19-digit IDs?**
A: IDs are preserved as strings end-to-end. JavaScript's default `JSON.parse` silently truncates integers above 2^53; we intercept the parse and keep full precision.

**Q: Can I get full HTML article bodies, not just summaries?**
A: Yes — `articleDetail` and `answerList` return the full HTML `content` field with embedded images and formatting intact.

**Q: Does the search return answers, questions, and articles together?**
A: Yes — one search call returns mixed types in a single dataset. Each row carries an `entityType` field so you can split downstream.

**Q: Is this legal?**
A: Yes — only publicly available data. See the [legal section](#-is-it-legal-to-scrape-data) below.

***

### 🐛 Troubleshooting

**`code:301 — FAILED, RETRY` errors on a specific question ID**

- A small number of historical Zhihu IDs are permanently flagged by upstream anti-bot. Try a different question — most modern IDs work fine. The actor already retries with backoff before surfacing the error.

**Empty results on a column ID**

- Double-check the column slug (the part after `zhuanlan.zhihu.com/`). Example: for `https://zhuanlan.zhihu.com/xuehy`, use `columnId: "xuehy"`.

**Search returns fewer results than expected**

- Increase `maxPages`. Zhihu paginates ~20 mixed results per page; deep pagination beyond 10 pages may return diminishing fresh content.

**Article body looks truncated**

- "盐选" (paywalled) Zhihu Plus articles return only excerpts on the public surface. The actor surfaces what Zhihu exposes — there is no premium-content backdoor.

**Author follower / voteup counts show 0**

- Some authors disable public stats. The fields are present but Zhihu returns 0 for these users.

***

### ⚠️ Trademark Disclaimer

This is an **independent scraping tool**. It is not affiliated with, endorsed by, or sponsored by Zhihu Inc. (知乎). The Zhihu® and 知乎® names appear under nominative fair use solely to describe the platform this tool reads from. All trademarks are the property of their respective owners.

***

### ⚖️ Is it legal to scrape data?

Our actors are ethical and do not extract any private user data, such as email addresses, gender, or location. They only extract what the user has chosen to share publicly. We therefore believe that our actors, when used for ethical purposes by Apify users, are safe.

However, you should be aware that your results could contain personal data. Personal data is protected by the **GDPR** in the European Union and by other regulations around the world. You should not scrape personal data unless you have a legitimate reason to do so. If you're unsure whether your reason is legitimate, consult your lawyers.

You can also read Apify's blog post on the [legality of web scraping](https://blog.apify.com/is-web-scraping-legal/).

***

### ⭐ Love this actor?

[Leave a 5-star review](https://apify.com/sian.agency/zhihu-scraper/reviews) — it helps us build more features for you and keeps the SIÁN portfolio growing.

***

### 🤝 Support

[![Telegram Support](https://img.shields.io/badge/Telegram-Support%20Group-0088cc?logo=telegram)](https://t.me/+vyh1sRE08sAxMGRi)

**Join our active support community**

- For issues or questions, open an issue in the actor's repository
- Check the [SIÁN Agency Store](https://apify.com/sian.agency?fpr=sian) for more China-market automation tools
- 📧 <apify@sian-agency.online>

***

**Built by [SIÁN Agency](https://www.sian-agency.online)** | **[More Tools](https://apify.com/sian.agency?fpr=sian)**

# Actor input Schema

## `operation` (type: `string`):

🎯 **PICK ONE OPERATION PER RUN.** Each run produces one clean dataset matching the chosen mode.

- **🔍 Search Zhihu** — keyword search across answers, questions, articles, people (~20 mixed results/page)
- **❓ Question Answers** — paginated answers for a single question, ranked by Zhihu's algorithm (~5 answers/page)
- **📰 Article Detail** — single Zhihu column article by ID (full content, author, vote/comment counts, topics)
- **📚 Column Articles** — paginated article list from a Zhihu column (zhuanlan) by slug (~10 articles/page)

💡 **TIP:** To combine operations, run the actor multiple times with different configurations.

## `keyword` (type: `string`):

🔍 **Required for the `Search Zhihu` operation.**

Any Zhihu search query. Mixed Chinese / English supported:

- `Python`
- `人工智能` (artificial intelligence)
- `投资理财` (investment)
- `品牌营销` (brand marketing)

💡 **TIP:** Chinese-language queries return native Mandarin results; English queries surface bilingual / cross-cultural threads. Mixed results include answers, questions, articles, and people — filter the dataset by `resultType` to split modes.

⚠️ **Ignored** for the Question Answers, Article Detail, and Column Articles operations.

## `questionId` (type: `string`):

❓ **Required for the `Question Answers` operation.**

The numeric Zhihu question ID. You can find it:

- In any Zhihu question URL: `https://www.zhihu.com/question/{ID}` → the trailing numeric segment
- In the `questionId` field of any answer or search result row

💡 **TIP:** To pull a question's full answer thread, start with `maxPages: 5` (~25 answers) and increase as needed. Answers are returned in Zhihu's ranking order — top-voted first by default.

⚠️ **Ignored** for Search, Article Detail, and Column Articles operations.

## `articleId` (type: `string`):

📰 **Required for the `Article Detail` operation.**

The numeric Zhihu column article ID. You can find it:

- In any Zhuanlan URL: `https://zhuanlan.zhihu.com/p/{ID}` → the trailing numeric segment
- In the `articleId` field of any search or column-article-list result row

💡 **TIP:** Article Detail returns the full HTML content body, author profile, vote/comment counts, topics, and the parent column reference — ideal for in-depth scraping of a known article.

⚠️ **Ignored** for Search, Question Answers, and Column Articles operations.

## `columnId` (type: `string`):

📚 **Required for the `Column Articles` operation.**

The Zhihu column (zhuanlan) slug. Found in the column URL: `https://zhuanlan.zhihu.com/{slug}` → the trailing path segment.

Examples:

- `xuehy` — Xue Hongyan's investment column
- `qingreading` — popular reading-recommendation column
- `kaiyuan` — open-source / tech column

💡 **TIP:** Use Column Articles to pull a curator's entire article history in chronological reverse order. Combine with Article Detail to enrich the top-N articles with full content.

⚠️ **Ignored** for Search, Question Answers, and Article Detail operations.

## `maxPages` (type: `integer`):

📄 **Applies to paginated operations** (Search Zhihu, Question Answers, Column Articles). Ignored for Article Detail (single record).

- **Search Zhihu:** ~20 mixed results per page
- **Question Answers:** ~5 answers per page
- **Column Articles:** ~10 articles per page

💡 **TIP:** Start small (1–3 pages) to preview results before scaling up.

⚠️ Hard cap: 50 pages to prevent runaway runs.

## Actor input object example

```json
{
  "operation": "search",
  "keyword": "Python",
  "questionId": "20791060",
  "articleId": "2032860336215307118",
  "columnId": "xuehy",
  "maxPages": 5
}
```

# Actor output Schema

## `output` (type: `string`):

Answers, articles, questions, or search results — one flat row per upstream item with curated camelCase aliases (answerId, questionId, articleId, columnId, authorName, voteupCount, commentCount, content, excerpt, createdTime, articlePageUrl, …) plus the raw upstream fields spread alongside.

## `report` (type: `string`):

HTML report with run status, success/error row counts, success rate, pages fetched, duration, and the inputs used — written even on fatal crash.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "keyword": "Python",
    "questionId": "20791060",
    "articleId": "2032860336215307118",
    "columnId": "xuehy"
};

// Run the Actor and wait for it to finish
const run = await client.actor("sian.agency/zhihu-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "keyword": "Python",
    "questionId": "20791060",
    "articleId": "2032860336215307118",
    "columnId": "xuehy",
}

# Run the Actor and wait for it to finish
run = client.actor("sian.agency/zhihu-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "keyword": "Python",
  "questionId": "20791060",
  "articleId": "2032860336215307118",
  "columnId": "xuehy"
}' |
apify call sian.agency/zhihu-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=sian.agency/zhihu-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Zhihu Scraper — Q&A, Answers, Articles, Columns",
        "description": "Zhihu scraper — extract long-form Mandarin Q&A, expert answers, articles & column posts. Keyword search, question answer threads, article detail, column article list. China market research, LLM training data, competitive intel. Four operations, one clean dataset per run. No API key.",
        "version": "1.1",
        "x-build-id": "dNeXR1MAeIdTEIgRY"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/sian.agency~zhihu-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-sian.agency-zhihu-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/sian.agency~zhihu-scraper/runs": {
            "post": {
                "operationId": "runs-sync-sian.agency-zhihu-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/sian.agency~zhihu-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-sian.agency-zhihu-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "operation"
                ],
                "properties": {
                    "operation": {
                        "title": "🎯 Operation — what do you want to scrape?",
                        "enum": [
                            "search",
                            "answerList",
                            "articleDetail",
                            "columnArticleList"
                        ],
                        "type": "string",
                        "description": "🎯 **PICK ONE OPERATION PER RUN.** Each run produces one clean dataset matching the chosen mode.\n\n- **🔍 Search Zhihu** — keyword search across answers, questions, articles, people (~20 mixed results/page)\n- **❓ Question Answers** — paginated answers for a single question, ranked by Zhihu's algorithm (~5 answers/page)\n- **📰 Article Detail** — single Zhihu column article by ID (full content, author, vote/comment counts, topics)\n- **📚 Column Articles** — paginated article list from a Zhihu column (zhuanlan) by slug (~10 articles/page)\n\n💡 **TIP:** To combine operations, run the actor multiple times with different configurations.",
                        "default": "search"
                    },
                    "keyword": {
                        "title": "🔍 Search Keyword (for Search Zhihu)",
                        "type": "string",
                        "description": "🔍 **Required for the `Search Zhihu` operation.**\n\nAny Zhihu search query. Mixed Chinese / English supported:\n- `Python`\n- `人工智能` (artificial intelligence)\n- `投资理财` (investment)\n- `品牌营销` (brand marketing)\n\n💡 **TIP:** Chinese-language queries return native Mandarin results; English queries surface bilingual / cross-cultural threads. Mixed results include answers, questions, articles, and people — filter the dataset by `resultType` to split modes.\n\n⚠️ **Ignored** for the Question Answers, Article Detail, and Column Articles operations."
                    },
                    "questionId": {
                        "title": "❓ Question ID (for Question Answers)",
                        "type": "string",
                        "description": "❓ **Required for the `Question Answers` operation.**\n\nThe numeric Zhihu question ID. You can find it:\n- In any Zhihu question URL: `https://www.zhihu.com/question/{ID}` → the trailing numeric segment\n- In the `questionId` field of any answer or search result row\n\n💡 **TIP:** To pull a question's full answer thread, start with `maxPages: 5` (~25 answers) and increase as needed. Answers are returned in Zhihu's ranking order — top-voted first by default.\n\n⚠️ **Ignored** for Search, Article Detail, and Column Articles operations."
                    },
                    "articleId": {
                        "title": "📰 Article ID (for Article Detail)",
                        "type": "string",
                        "description": "📰 **Required for the `Article Detail` operation.**\n\nThe numeric Zhihu column article ID. You can find it:\n- In any Zhuanlan URL: `https://zhuanlan.zhihu.com/p/{ID}` → the trailing numeric segment\n- In the `articleId` field of any search or column-article-list result row\n\n💡 **TIP:** Article Detail returns the full HTML content body, author profile, vote/comment counts, topics, and the parent column reference — ideal for in-depth scraping of a known article.\n\n⚠️ **Ignored** for Search, Question Answers, and Column Articles operations."
                    },
                    "columnId": {
                        "title": "📚 Column ID / Slug (for Column Articles)",
                        "type": "string",
                        "description": "📚 **Required for the `Column Articles` operation.**\n\nThe Zhihu column (zhuanlan) slug. Found in the column URL: `https://zhuanlan.zhihu.com/{slug}` → the trailing path segment.\n\nExamples:\n- `xuehy` — Xue Hongyan's investment column\n- `qingreading` — popular reading-recommendation column\n- `kaiyuan` — open-source / tech column\n\n💡 **TIP:** Use Column Articles to pull a curator's entire article history in chronological reverse order. Combine with Article Detail to enrich the top-N articles with full content.\n\n⚠️ **Ignored** for Search, Question Answers, and Article Detail operations."
                    },
                    "maxPages": {
                        "title": "📄 Max pages to fetch",
                        "minimum": 1,
                        "maximum": 50,
                        "type": "integer",
                        "description": "📄 **Applies to paginated operations** (Search Zhihu, Question Answers, Column Articles). Ignored for Article Detail (single record).\n\n- **Search Zhihu:** ~20 mixed results per page\n- **Question Answers:** ~5 answers per page\n- **Column Articles:** ~10 articles per page\n\n💡 **TIP:** Start small (1–3 pages) to preview results before scaling up.\n\n⚠️ Hard cap: 50 pages to prevent runaway runs.",
                        "default": 5
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
