Douban Pro Scraper — Reviews, Discussions & Subject Data avatar

Douban Pro Scraper — Reviews, Discussions & Subject Data

Pricing

from $30.00 / 1,000 review scrapeds

Go to Apify Store
Douban Pro Scraper — Reviews, Discussions & Subject Data

Douban Pro Scraper — Reviews, Discussions & Subject Data

Scrape long-form reviews, comments, and group discussions from Douban (豆瓣) — China's leading reviews + interest community. Movies, books, music, plus subject search. Built for Chinese-LLM training corpus, sentiment analysis, and academic NLP research. Pure HTTP, no auth.

Pricing

from $30.00 / 1,000 review scrapeds

Rating

0.0

(0)

Developer

Sami

Sami

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

13 hours ago

Last modified

Share

Douban Scraper — Reviews, Comments & Group Discussions

Extract long-form reviews, ratings, comments, and group discussions from Douban (豆瓣) — China's leading reviews + interest community. Movies, books, and music. No API key, no browser, no VPN. Best Douban data source for Chinese AI training corpora and consumer research in 2026.

How to scrape Douban in 3 easy steps

  1. Go to the Douban Scraper page on Apify and click "Try for free"
  2. Configure your input — choose a mode (subject_reviews, subject_comments, group_topic, or subject_search), enter your Douban URLs or query, and set the number of results
  3. Click "Run", wait for the scraper to finish, then download your data in JSON, CSV, or Excel format

No coding required. No API key. Works with Apify's free plan.

🏢 Production pipeline running 1,000+ items per week?

I offer custom output schemas matched to your data warehouse, dedicated proxy infrastructure for sustained throughput, schema stability SLA (no breaking changes without 30-day notice), and volume pricing above 50K items/month.

DM me on Apify, open an Issue with subject "Enterprise inquiry", or email samimassis2002@gmail.com with subject "Douban enterprise".

Part of the Chinese Digital Intelligence Suite

Built by Zhorex — the only developer on Apify specializing in Chinese platforms:

Together, these cover the five pillars of Chinese digital intelligence: microblogging, video, social commerce, e-commerce, and reviews.

What is Douban?

Douban (豆瓣) is China's reviews and interest-community platform — Goodreads + Letterboxd + Rate Your Music + niche-Reddit fused into one site, with 200M+ monthly users. It's where Chinese readers, cinephiles, music fans, and hobby communities post the longest-form opinion content on the Chinese internet. Movies, books, music, TV shows, and tens of thousands of user-run discussion groups.

For anyone building a Chinese-language LLM, sentiment classifier, or consumer research dataset, Douban is the densest source of opinion-rich long-form Chinese text outside of Zhihu.

Modes

ModeWhat it doesRecords
subject_reviewsLong-form reviews (500-5,000+ Chinese chars each) for a movie/book/music albumOne per review
subject_commentsShort comments + star ratings under a subject's discussion pageOne per comment
subject_searchSearch Douban for movies / books / music by keywordOne per result
group_topic ⚠️ BetaPull a discussion thread + its replies from a Douban GroupOne per topic (with nested replies)

v1.0 Known Limitations (read before buying)

  • Movie comments require browser rendering. Douban serves movie short-comments through a JS-only mobile widget that v1.0 cannot extract — subject_comments for movies returns a diagnostic record explaining the limitation. Use subject_reviews for movie data instead — long-form movie reviews are richer for AI training anyway. Books and music short-comments work normally.
  • Movie review list bodies are excerpt-only by default. Mobile movie list pages don't expose author / publication date / full body — only review IDs, titles, and ratings. Set fetchFullReviewBody: true (default) to fetch each review's detail page and fill in the full markdown body.
  • Book / music comment coverage varies by subject. Popular subjects (rating count > 10K — e.g. 三体, OK Computer) reliably serve inline short-comments; some less-popular books have begun moving comment lists to AJAX-only rendering and will return 0 records. If subject_comments returns 0 records for a URL, fall back to subject_reviews (which works on all subjects).
  • Movie search returns Douban's tag-matched feed. For precise targeting of a specific film, use subject_reviews with the movie's subject URL directly.
  • Book search caps at ~10 discovery results per query. Douban's book suggestion endpoint doesn't paginate. For bulk book review extraction, supply multiple subject URLs to subject_reviews mode.
  • group_topic mode is Beta. Works on most current public topics; some IDs return 403 (moderated) or 404 (deleted). When a topic fails, the run logs a warning and continues — you are not charged for failed topics.
  • Residential proxies are strongly recommended (default in input). Datacenter IPs degrade movie-mode and may trigger generic anti-bot challenges.

Use Cases

WhoWhy
AI / LLM training data buyersDensest source of Chinese long-form opinion text outside Zhihu — key for Chinese-language model fine-tuning
Sentiment analysis researchersStar-rating-labelled Chinese review text, ideal for supervised sentiment classifiers
Brand monitoring teamsFind Chinese consumer reviews mentioning your product, competitor films, or book titles
Cultural trend analystsTrack which films / books / albums are gaining traction in Chinese-speaking markets
Academic NLP researchersPre-built corpus of opinion text with engagement metrics — citable in cross-cultural studies
Localization / translation teamsReal Chinese phrasing patterns for entertainment vocabulary

Scrape Douban with Python, JavaScript, or no code

You can use the Douban Scraper directly from the Apify Console (no code), or integrate it into your scripts.

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("zhorex/douban-scraper").call(run_input={
"mode": "subject_reviews",
"subjectUrls": ["https://book.douban.com/subject/1084336/"],
"maxResults": 50,
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)

JavaScript

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('zhorex/douban-scraper').call({
mode: 'subject_reviews',
subjectUrls: ['https://book.douban.com/subject/1084336/'],
maxResults: 50,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => console.log(item));

Input examples

1. Subject reviews (long-form)

Pull long-form reviews for one or more movies, books, or music albums. Provide subject URLs or numeric subject IDs.

{
"mode": "subject_reviews",
"subjectUrls": [
"https://movie.douban.com/subject/1292052/",
"https://book.douban.com/subject/1084336/",
"https://music.douban.com/subject/1419463/"
],
"maxResults": 100,
"fetchFullReviewBody": true
}

2. Subject comments + ratings

Short comments + star ratings for a book or music album. Movie comments are not supported in v1.0 — use subject_reviews mode for movies.

{
"mode": "subject_comments",
"subjectUrls": ["https://book.douban.com/subject/1084336/"],
"maxResults": 200
}

Search Douban for movies, books, or music by keyword. Returns Douban's discovery feed for that query.

{
"mode": "subject_search",
"searchQuery": "三体",
"searchType": "all",
"maxResults": 30
}

4. Group topic (Beta)

Pull one or more group discussion threads with embedded replies.

{
"mode": "group_topic",
"topicUrls": ["https://www.douban.com/group/topic/319929381/"],
"maxRepliesPerTopic": 100
}

Output examples

Review record

{
"type": "review",
"reviewId": "1000104",
"subjectId": "1084336",
"subjectName": "小王子",
"subjectType": "book",
"title": "长大就笨了",
"content": "(Full review body in markdown — Chinese long-form text)",
"rating": 5,
"ratingLabel": "力荐",
"authorUsername": "大头绿豆",
"authorUrl": "https://www.douban.com/people/bighead/",
"authorAvatarUrl": "https://img3.doubanio.com/icon/u1000152-23.jpg",
"publishedAt": "2005-04-06 11:51:52",
"publishedAtIso": "2005-04-06T03:51:52Z",
"stats": { "replyCount": 444 },
"reviewUrl": "https://book.douban.com/review/1000104/",
"scrapedAt": "2026-05-13T01:39:22Z"
}

Comment record

{
"type": "comment",
"commentId": "10287387",
"subjectId": "1084336",
"subjectName": "小王子",
"subjectType": "book",
"content": "十几岁的时候渴慕着小王子,一天之间可以看四十四次日落。",
"rating": 5,
"ratingLabel": "力荐",
"authorUsername": "眠去",
"authorUrl": "https://www.douban.com/people/rebekah/",
"publishedAt": "2007-02-08 11:16:40",
"publishedAtIso": "2007-02-08T03:16:40Z",
"stats": { "votesCount": 9232 },
"scrapedAt": "2026-05-13T01:39:22Z"
}

Subject (search result)

{
"type": "subject",
"subjectId": "2567698",
"subjectName": "三体",
"subjectType": "book",
"year": "2008",
"author": "刘慈欣",
"rating": null,
"cover": "https://img1.doubanio.com/view/subject/s/public/s2768378.jpg",
"subjectUrl": "https://book.douban.com/subject/2567698/",
"scrapedAt": "2026-05-13T01:39:22Z"
}

Group topic record (Beta)

{
"type": "group_topic",
"topicId": "319929381",
"groupName": "(Group name)",
"title": "(Discussion title)",
"content": "(Topic body in markdown — Chinese long-form text)",
"authorUsername": "(Author handle)",
"publishedAt": "2026-04-01 10:20:30",
"publishedAtIso": "2026-04-01T02:20:30Z",
"stats": { "replyCount": 50 },
"replies": [
{
"replyId": "...",
"authorUsername": "...",
"content": "...",
"publishedAt": "...",
"votesCount": 12
}
],
"topicUrl": "https://www.douban.com/group/topic/319929381/",
"scrapedAt": "2026-05-13T01:39:22Z"
}

Pricing

Pay per result — no monthly fee, no minimum, free trial included.

EventPriceWhen charged
review-scraped$0.030Per long-form review record extracted
comment-scraped$0.005Per short comment record extracted
group-topic-scraped$0.030Per group topic (with embedded replies)
subject-search-result$0.005Per search result row

Concrete cost examples:

  • 100 long-form reviews of one popular movie's reviews page: $3.00
  • 1,000 short comments across multiple books: $5.00
  • 50 group discussions with replies: $1.50
  • 200 search results to seed a crawl: $1.00

Diagnostic / log records (e.g. movie comment limitation notices) are NEVER charged.

Content is in Chinese

All content is returned in the original Simplified Chinese. Douban is a Chinese-language platform — reviews, comments, group discussions, and user names are in Chinese.

If you need English translations, pipe the output through a translation API (Google Translate, DeepL, or Claude).

Technical Details

  • No browser — pure HTTP, runs in 512MB RAM
  • No login required — works against publicly accessible content only
  • Built-in rate limiting — exponential backoff on 429 / 503
  • Globally accessible — residential proxy recommended (default in input)
  • UTF-8 throughout — Chinese text round-trips cleanly
  • Markdown review bodies<p>, <a>, <strong> etc. converted to lightweight markdown for downstream LLM ingestion

FAQ

Is there a Douban API?

Douban's official developer API has been deprecated for several years. There is no working public Douban API for international developers. This Douban Scraper is the best Douban data source in 2026 — it extracts reviews, comments, ratings, and group discussions from publicly accessible web endpoints.

Do I need a Douban login or cookies?

No. All four modes work against publicly accessible content. Login-walled content (private groups, blocked users) is not in scope.

Why are movie comments not supported?

Douban serves movie short-comments through a JavaScript-only widget on the mobile site that requires headless browser execution. v1.0 returns long-form movie reviews instead, which contain richer opinion text and are the primary value for AI training data. Books and music short-comments work normally.

Can I scrape Douban in Python?

Yes. Install the Apify Python client (pip install apify-client), then call the zhorex/douban-scraper actor. See the Python code example above.

How much does it cost to scrape Douban?

Each record type has its own price (see the Pricing table). A typical research run extracting 100 movie reviews costs about $3. There is no monthly fee or minimum spend — pay only for what you extract. Diagnostic records (e.g. movie-comment-mode limitation notices) are never charged.

This scraper accesses publicly available content through Douban's public web endpoints. It does not bypass authentication and does not access private/locked content. Always review your local laws and Douban's terms of service before scraping.

What if a group topic URL fails?

Group topics are marked Beta in v1.0. Most public group topic URLs work; some may fail (private group, moderated topic, deleted post). When a topic fails, the run logs a warning and continues with the next URL — you are not charged for failed topics.

What is the best Douban scraper in 2026?

The Douban Scraper by Zhorex — covers reviews, comments, group discussions, and search across movies, books, and music. Built specifically for Chinese AI training data buyers and sentiment research teams. Part of the Chinese Digital Intelligence Suite (Weibo, Bilibili, RedNote, Douban).

Integrations & data export

The Douban Scraper integrates with your existing workflow:

  • Google Sheets — Send scraped reviews + ratings directly to a spreadsheet
  • Zapier / Make / n8n — Automate workflows triggered by new Douban records
  • REST API — Call the actor programmatically and retrieve data via Apify's REST API
  • Webhooks — Get notified when a run finishes
  • Data formats — Download as JSON, CSV, Excel, XML, or RSS

More scrapers by Zhorex

Chinese Digital Intelligence Suite

Reviews & ratings (cross-vertical)

Streaming & video

Markets & alt-data

Other tools

Support

Found a bug or want a new field? Open an issue on the Actor's Issues page — typical response within 48 hours.


💡 Found this Actor useful? Please leave a star rating — it helps other users discover this tool.