Pricing

from $25.00 / 1,000 long review (full body + sentiment)s

Douban Reviews Scraper

Scrape Douban (豆瓣) ratings, reviews & comments with sentiment tags for movies, TV, books, music & groups. Clean JSON for NLP/LLM training & analysis.

Pricing

from $25.00 / 1,000 long review (full body + sentiment)s

Rating

0.0

(0)

Developer

Tony

Actor stats

Bookmarked

Total users

Monthly active users

a day ago

Last modified

What you get

Paste one or more Douban subject URLs and get a clean dataset with three record types:

subject — the title, year, overall rating, rating count and genres.
comment — short user comments (high volume) with star rating + sentiment.
review — long-form reviews with full body text, star rating + sentiment.

Group URLs produce group_topic records (discussion topics with reply counts).

Sentiment is derived from the author's own Douban star rating — no guesswork, no ML black box: 5–4★ = positive, 3★ = neutral, 2–1★ = negative, unrated = null.

Supported URLs

Type	Example
Movie / TV	`https://movie.douban.com/subject/1292052/`
Book	`https://book.douban.com/subject/1084336/`
Music	`https://music.douban.com/subject/1407217/`
Group	`https://www.douban.com/group/beethoven/`

Example input

{
    "startUrls": [
        { "url": "https://movie.douban.com/subject/1292052/" }
    ],
    "scrapeShortComments": true,
    "scrapeLongReviews": true,
    "maxCommentsPerSubject": 200,
    "maxReviewsPerSubject": 50,
    "fetchFullReviewText": true,
    "tagSentiment": true,
    "proxyConfiguration": {
        "useApifyProxy": true,
        "apifyProxyGroups": ["RESIDENTIAL"]
    }
}

Example output

A short comment record:

{
    "record_type": "comment",
    "id": "comment:1234567890",
    "subject_id": "1292052",
    "subject_type": "movie",
    "subject_title": "肖申克的救赎 The Shawshank Redemption",
    "author": "影迷小王",
    "author_url": "https://www.douban.com/people/12345/",
    "rating_stars": 5,
    "rating_label": "力荐",
    "sentiment": "positive",
    "content": "希望让人自由。每看一次都有新的感动。",
    "useful_count": 1842,
    "created_at": "2021-03-14T21:05:00+08:00",
    "source_url": "https://movie.douban.com/subject/1292052/comments?status=P&start=0&limit=20&sort=new_score",
    "scraped_at": "2026-06-15T09:12:00.000Z"
}

Field notes:

id is stable across runs (built from Douban's comment/review id) — use it to deduplicate on your side.
rating_stars / sentiment are null when the user left a comment without a rating.
created_at is the original Douban timestamp in China Standard Time (UTC+8); scraped_at is ISO-8601 UTC.
For long reviews, content_truncated: false means the full essay body was captured (fetchFullReviewText enabled).

Pricing

Pay-per-event — you pay per item actually extracted, so cost scales with value:

Event	Price
Long review (full body + sentiment)	$0.025
Short comment	$0.005
Subject info	$0.006
Group topic	$0.012
Actor start	$0.00005

(Final prices are shown on the Apify Store listing.)

Limitations

Douban serves a JavaScript proof-of-work anti-bot challenge, so this actor runs a real headless browser (Playwright + Chromium) to clear it. Recommended run settings: 4 GB memory and residential Apify Proxy. The browser solves the challenge automatically and retries on a fresh session if it doesn't clear.
Douban limits how deep short-comment pagination goes for logged-out access (typically the first few hundred). Set maxCommentsPerSubject realistically.
A minority of titles (often sensitive ones) gate their short comments behind login entirely for logged-out visitors; for those, the actor still returns the subject info and long reviews, but short comments come back empty. Long reviews and ratings are not gated.
Keep maxConcurrency modest (default 3). Under heavy concurrency Douban occasionally soft-throttles a page, which can make a single comment page come back empty; lower concurrency avoids this.
Some music/book subject pages expose fewer fields (e.g. no genres); those come back null.
Public data only — the actor never logs in or scrapes login-walled content.

FAQ

Which content should I scrape? Toggle scrapeShortComments, scrapeLongReviews and scrapeSubjectInfo independently. Short comments are cheapest and highest-volume; long reviews are richer for sentiment / NLP work.

Can I run this on a schedule? Yes — use Apify Schedules. Reviews are evergreen, so weekly is usually plenty.

How do I export to my DB / Google Sheets? Use Apify Integrations or the Dataset API — every field above is available via /items?format=json|csv|xlsx. The dataset also ships pre-built Short comments and Long reviews table views.

Why is sentiment sometimes null? The user rated nothing, so there's no star signal to map. The raw content is still captured.

Douban Pro Scraper — Reviews, Discussions & Subject Data

zhorex/douban-scraper

Scrape long-form reviews, comments, and group discussions from Douban (豆瓣) — China's leading reviews + interest community. Movies, books, music, plus subject search. Built for Chinese-LLM training corpus, sentiment analysis, and academic NLP research. Pure HTTP, no auth.

Sami

douban book pro

kuaima/douban-book-pro

An actor to get data from douban book site with more useful information. For simple usage, please check the free one https://apify.com/kuaima/douban.

kuai ma

douban

kuaima/douban

This actor can crawl data from douban. It can get data of top 10 book from [豆瓣读书](https://book.douban.com/). For more powerful actor, please check https://apify.com/kuaima/douban-book-pro

kuai ma

Douban Movie, Book & Music Top List Scraper

jungle_synthesizer/douban-movie-book-music-top-list-scraper

Scrape Douban Top 250 movies, top 250 books, and music charts. Returns ranked items with ratings, cast, authors, genres, IMDb cross-references, and snapshot timestamps across all three content types.

BowTiedRaccoon

Douban Movie Scraper — Ratings, Reviews & Hot Lists

sian.agency/douban-movie-scraper

Scrape Douban (豆瓣电影) into clean datasets — movie & TV ratings, cast and crew, long-form reviews, viewer comments with province geo, IMDb cross-IDs, and the live Recent Hot Movie & Hot TV trending lists. Six operations, one actor. No account or API key needed.

SIÁN OÜ

Chinese AI Training Corpus Engine

zhorex/chinese-corpus-engine

Turn China's public web into AI-training-ready text. Pulls Weibo, Bilibili, Xueqiu, Douban & RedNote, then deduplicates, quality-scores, PII-scrubs and provenance-stamps every document. From $0.025/doc, pay-as-you-go. For LLM training-data teams, data vendors & academic NLP researchers.

Sami

Metacritic Scraper - Games, Movies, TV & Music Reviews

lulzasaur/metacritic-scraper

Scrape Metacritic review scores for games, movies, TV shows, and music. Extract Metascores, user scores, critic review counts, genres, ratings, and more. Search by title or browse top-rated media.

lulz bot

Chinese Brand Monitor — Weibo+RedNote+Bilibili+Douban+Xueqiu

zhorex/chinese-brand-monitor

Track brand mentions across Weibo, Xiaohongshu (RedNote), Bilibili, Douban and Xueqiu in one normalized API call. Sentiment-tagged, cross-platform deduplicated. $0.045 per mention, pay-as-you-go. Synthesio/Brandwatch alternative for brand monitoring agencies, DTC China teams, and hedge funds.

Sami

5.0

Metacritic Scraper - Game and Movie Reviews

parseforge/metacritic-scraper

Scrape Metacritic metascores, critic reviews, and user ratings for games, movies, TV shows, and music. Extract release dates, platform availability, and developer details.