❓ Zhihu Question Answers Scraper avatar

❓ Zhihu Question Answers Scraper

Pricing

$3.00 / 1,000 results

Go to Apify Store
❓ Zhihu Question Answers Scraper

❓ Zhihu Question Answers Scraper

Extract Zhihu question answers data — title, author, engagement, and more. Scrape by keyword, URL or ID. Export to JSON, CSV & Excel, use the API, schedule runs and integrate. No code required.

Pricing

$3.00 / 1,000 results

Rating

0.0

(0)

Developer

Jackie Chen

Jackie Chen

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Categories

Share

Zhihu Question Answers Scraper

zhihu-question-answers-scraper

Scrape every answer on a Zhihu (知乎) question — with the full answer body, not just a snippet. Each answer becomes one structured item: complete HTML content, a plain-text excerpt, the author's name / handle / follower count, upvotes, comment count, and timestamps. This is ideal raw material for LLM training data, Q&A-pair datasets, and content research.

Unofficial. This Actor is not affiliated with, authorized, or endorsed by Zhihu (知乎 / Zhihu Inc.). It is an independent tool that retrieves publicly available data via a third-party API. Use it in compliance with Zhihu's terms and all applicable laws; you are responsible for how you use the retrieved data.

What it does

  • Question answers (primary) — give one or more Zhihu question IDs; the Actor paginates through every answer on each question and emits one item per answer with the full body text.
  • Answer comments (optional) — give answer IDs to also pull their top-level comments, one item per comment.

Input

FieldTypeDefaultDescription
questionIdsstring[]["19550517"]Zhihu question IDs (numeric id from zhihu.com/question/<id>). Each is paginated independently.
orderenumdefaultdefault (Zhihu ranking) or updated (most recently updated).
answerIdsstring[][]Optional. Answer IDs to pull top-level comments for.
maxItemsinteger50Max total items (answers + comments) across all inputs.
maxCommentsPerAnswerinteger50Cap on comments fetched per answer ID.

Example input

{
"questionIds": ["19550517", "20278289"],
"order": "default",
"maxItems": 200
}

Output

One dataset item per answer:

{
"answerId": "12202199",
"type": "answer",
"url": "https://www.zhihu.com/question/19550517/answer/12202199",
"questionId": 19550517,
"questionTitle": "Instagram 有多少用户?",
"excerpt": "1500 万。今年 3 月份报出来的数字 ...",
"content": "<p>... full answer body as HTML ...</p>",
"voteupCount": 2,
"commentCount": 7,
"createdTime": 1334057938,
"updatedTime": 1334057938,
"authorName": "黄继新",
"authorUrlToken": "jixin",
"authorId": "...",
"authorFollowerCount": 1006585,
"authorHeadline": "知乎 联合创始人",
"authorUrl": "https://www.zhihu.com/people/jixin",
"source": "question:19550517"
}

When answerIds is supplied, comment items are also emitted with type: "comment", commentId, answerId, content, voteupCount, childCommentCount, and author fields.

Notes

  • Data is sourced live; the upstream API occasionally returns a transient retry signal, so the Actor retries with exponential backoff and a browser User-Agent.
  • Items are de-duplicated by id within a run.