Pricing

Pay per event

Douban Scraper

Scrape public Douban movie, book, music, search, top-list, review, and comment data for China media intelligence workflows.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Actor stats

Bookmarked

Total users

Monthly active users

19 days ago

Last modified

What does Douban Scraper do?

Douban Scraper extracts structured records from public Douban pages. It can collect Movie Top 250 entries, hot movie JSON records, Douban search results, and metadata from public start URLs.

Use it to turn Douban pages into clean JSON, CSV, Excel, or API-ready datasets.

Who is it for?

🧭 China market researchers tracking audience sentiment and cultural trends.
🎬 Film, TV, music, and publishing analysts comparing ratings and rankings.
🤖 LLM and NLP teams building Chinese-language media corpora from public pages.
📊 Social listening teams monitoring public reviews, comments, and group topics.
🧪 Data journalists and academics studying public Douban lists and search results.

Why use this actor?

Douban is a high-signal source for Chinese media, entertainment, books, music, and community discussion. Manual copy-paste is slow, inconsistent, and hard to repeat. This actor gives you repeatable public extraction with normalized fields and Apify platform integrations.

Public data only

The actor is scoped to public pages. It does not log in, bypass paywalls, use cookies, or attempt to defeat captcha/security flows. If Douban returns a security verification page for a specific detail URL, the actor skips it and continues with other public sources.

Supported sources

https://www.douban.com/search public search pages.
https://movie.douban.com/top250 Movie Top 250 pages.
https://movie.douban.com/j/search_subjects public hot movie JSON endpoint.
Public Douban movie, book, music, review, and group-topic URLs supplied as start URLs.

Input modes

Add Douban URLs in startUrls.
Enter a searchQuery and choose section.
Select topLists such as movie-top250 or movie-hot.
Set maxItems to control dataset size and cost.

Example input

{
  "topLists": ["movie-top250"],
  "maxItems": 25,
  "proxyConfiguration": { "useApifyProxy": false }
}

Search example input

{
  "searchQuery": "科幻",
  "section": "movie",
  "topLists": [],
  "maxItems": 20
}

URL example input

{
  "startUrls": [
    { "url": "https://movie.douban.com/top250" }
  ],
  "maxItems": 50
}

Output data

Each dataset row is a normalized Douban record. Fields are populated when Douban exposes the value publicly.

Field	Description
`url`	Source record URL
`canonicalUrl`	Canonical URL when available
`id`	Douban numeric identifier when detected
`type`	Subject, review, topic, profile, or page
`section`	Movie, book, music, group, or all
`source`	Input source such as `movie-top250`, `movie-hot`, `search`, or `startUrl`
`query`	Search query for search records
`title`	Public title
`originalTitle`	Original title when visible
`description`	Public description, quote, intro, or snippet
`rating`	Numeric Douban rating when visible
`ratingCount`	Public rating count when visible
`rank`	List rank, such as Movie Top 250 rank
`year`	Release/publication year when parsed
`genres`	Public genres/tags
`directors`	Directors or creators when visible
`authors`	Authors when visible
`cast`	Cast when visible
`region`	Region/country when parsed
`language`	Language when visible
`duration`	Runtime when visible
`pages`	Book page count when visible
`imageUrl`	Main image/poster URL
`mediaUrls`	List of media/image URLs
`authorName`	Review/comment/topic author when visible
`date`	Public date when visible
`upvoteCount`	Upvotes/helpful count when visible
`commentCount`	Comment count when visible
`scrapedAt`	ISO timestamp of extraction

Example output

{
  "url": "https://movie.douban.com/subject/1292052/",
  "id": "1292052",
  "type": "subject",
  "section": "movie",
  "source": "movie-top250",
  "title": "肖申克的救赎",
  "originalTitle": "The Shawshank Redemption",
  "rating": 9.7,
  "rank": 1,
  "year": "1994",
  "region": "美国",
  "scrapedAt": "2026-06-20T00:00:00.000Z"
}

How much does it cost to scrape Douban?

The actor uses pay-per-event pricing: a small start fee plus a per-record event for each saved Douban record. Keep maxItems low for trial runs, then scale once the output matches your workflow.

Tips for reliable Douban scraping

Start with Movie Top 250 or search pages because they are publicly rendered.
Keep maxItems small while testing your workflow.
Use datacenter proxy first if you need proxying.
Use residential proxy only when your workload is blocked and the economics still make sense.
Avoid repeatedly requesting login-only or security-check pages.

Reviews and comments

Douban may expose some review, comment, and group-topic content publicly. The actor treats these as best-effort start URL records in v0.1. It does not log in or expand private content. Future versions can add deeper public review/comment pagination if it remains commercially reliable.

Integrations

Use Douban Scraper with:

Google Sheets exports for analyst review.
Apify datasets API for data pipelines.
Webhooks to notify when a scheduled monitor finishes.
LLM enrichment actors for translation, classification, sentiment, or entity extraction.
BI tools that consume CSV, Excel, JSON, or NDJSON.

API usage with Node.js

import { ApifyClient } from 'apify-client';

const client = new ApifyClient({ token: process.env.APIFY_TOKEN });
const run = await client.actor('automation-lab/douban-scraper').call({
  topLists: ['movie-top250'],
  maxItems: 25,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items.slice(0, 3));

API usage with Python

from apify_client import ApifyClient
import os

client = ApifyClient(os.environ['APIFY_TOKEN'])
run = client.actor('automation-lab/douban-scraper').call(run_input={
    'searchQuery': '科幻',
    'section': 'movie',
    'topLists': [],
    'maxItems': 20,
})
items = client.dataset(run['defaultDatasetId']).list_items().items
print(items[:3])

API usage with cURL

curl -X POST "https://api.apify.com/v2/acts/automation-lab~douban-scraper/runs?token=$APIFY_TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{"topLists":["movie-top250"],"maxItems":25}'

MCP usage

Connect through Apify MCP using:

https://mcp.apify.com/?tools=automation-lab/douban-scraper

Add it to Claude Code:

$claude mcp add apify-douban "https://mcp.apify.com/?tools=automation-lab/douban-scraper"

Desktop MCP JSON configuration:

{
  "mcpServers": {
    "apify-douban": {
      "url": "https://mcp.apify.com/?tools=automation-lab/douban-scraper"
    }
  }
}

Example prompts:

"Run Douban Scraper for Movie Top 250 and summarize the highest-rated films."
"Search Douban for 科幻 movies and group the results by rating."
"Extract public Douban records and prepare a sentiment-analysis input table."

Scheduling

Schedule the actor daily, weekly, or monthly to monitor public Douban list or search changes. Use Apify webhooks to send the dataset to your warehouse or notification system.

Proxy settings

Proxy is optional. Start without proxy or with Apify datacenter proxy. Residential proxy can improve access for some workloads but costs more, so test with a small maxItems first.

Data quality notes

Douban page layouts vary by section and anti-bot state. The actor normalizes visible values and leaves unavailable fields empty. Public list/search extraction is more reliable than individual subject pages that trigger security checks.

FAQ

Can I scrape private Douban pages?

No. The actor is designed for public data only and skips login-only or security-check pages.

Does it support reviews and comments?

It supports public review, comment, and topic URLs on a best-effort basis. Deep login-only review expansion is intentionally excluded from v0.1.

Troubleshooting: why did I get fewer results?

You may have hit maxItems, supplied a URL that requires security verification, or selected a search query with few public results. Try Movie Top 250 or a broader query to validate your setup.

Troubleshooting: why are some fields empty?

Douban does not expose every field on every page type. For example, Top 250 cards expose ratings and ranks but not full star distributions. Empty optional fields mean the value was not visible publicly on that page.

Legality

This actor extracts publicly available information. You are responsible for using the data lawfully, respecting Douban terms, privacy rights, copyright rules, and applicable laws in your jurisdiction.

Changelog

0.1

Initial public-data Douban scraper with Top 250, hot movies, search, and start URL extraction.

Development notes

This actor is built with HTTP requests and Cheerio. It is intentionally lightweight and configured for 256 MB memory by default.

Field coverage roadmap

Add deeper public review pagination if target behavior remains stable.
Add public book and music list shortcuts.
Add section-specific parsers for book pages, music pages, and group topics.
Add optional translation/enrichment workflows through companion actors.

Support

If a public Douban page returns no data, include the run ID and the input URL when reporting the issue. That makes it possible to distinguish parser changes from Douban security responses.

douban

kuaima/douban

This actor can crawl data from douban. It can get data of top 10 book from [豆瓣读书](https://book.douban.com/). For more powerful actor, please check https://apify.com/kuaima/douban-book-pro

kuai ma

douban book pro

kuaima/douban-book-pro

An actor to get data from douban book site with more useful information. For simple usage, please check the free one https://apify.com/kuaima/douban.

kuai ma

Douban Movie, Book & Music Top List Scraper

jungle_synthesizer/douban-movie-book-music-top-list-scraper

Scrape Douban Top 250 movies, top 250 books, and music charts. Returns ranked items with ratings, cast, authors, genres, IMDb cross-references, and snapshot timestamps across all three content types.

BowTiedRaccoon

Douban API — China Movie & TV Ratings, Top 250 & Hot Lists

nexgendata/douban-tracker

Track Douban (豆瓣), China's IMDb/Goodreads. Pull hot lists, Top 250, now-playing, top-rated & keyword search for movies & TV — title, rating, cover image, Douban URL, rank. China's canonical consumer-taste & IP-reception source for brand & consumer research. No login; optional Notion delivery.

NexGenData

Douban Reviews Scraper

stackrelay/douban-reviews-scraper

Scrape Douban (豆瓣) ratings, reviews & comments with sentiment tags for movies, TV, books, music & groups. Clean JSON for NLP/LLM training & analysis.

StackRelay

Douban Pro Scraper — Reviews, Discussions & Subject Data

zhorex/douban-scraper

Scrape long-form reviews, comments, and group discussions from Douban (豆瓣) — China's leading reviews + interest community. Movies, books, music, plus subject search. Built for Chinese-LLM training corpus, sentiment analysis, and academic NLP research. Pure HTTP, no auth.

Sami

Douban Movie Scraper — Ratings, Reviews & Hot Lists

sian.agency/douban-movie-scraper

Scrape Douban (豆瓣电影) into clean datasets — movie & TV ratings, cast and crew, long-form reviews, viewer comments with province geo, IMDb cross-IDs, and the live Recent Hot Movie & Hot TV trending lists. Six operations, one actor. No account or API key needed.

SIÁN OÜ

Chinese Brand Monitor — Weibo+RedNote+Bilibili+Douban+Xueqiu

zhorex/chinese-brand-monitor

Track brand mentions across Weibo, Xiaohongshu (RedNote), Bilibili, Douban and Xueqiu in one normalized API call. Sentiment-tagged, cross-platform deduplicated. $0.09 per mention, pay-as-you-go. Synthesio/Brandwatch alternative for brand monitoring agencies, DTC China teams, and hedge funds.