Douban Scraper
Pricing
Pay per event
Douban Scraper
Scrape public Douban movie, book, music, search, top-list, review, and comment data for China media intelligence workflows.
Pricing
Pay per event
Rating
0.0
(0)
Developer
Stas Persiianenko
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Scrape public Douban movie, book, music, search, top-list, review, comment, and group-topic pages for China media intelligence workflows. This Apify Actor focuses on publicly accessible Douban data and does not require login, cookies, or private accounts.
What does Douban Scraper do?
Douban Scraper extracts structured records from public Douban pages. It can collect Movie Top 250 entries, hot movie JSON records, Douban search results, and metadata from public start URLs.
Use it to turn Douban pages into clean JSON, CSV, Excel, or API-ready datasets.
Who is it for?
- 🧭 China market researchers tracking audience sentiment and cultural trends.
- 🎬 Film, TV, music, and publishing analysts comparing ratings and rankings.
- 🤖 LLM and NLP teams building Chinese-language media corpora from public pages.
- 📊 Social listening teams monitoring public reviews, comments, and group topics.
- 🧪 Data journalists and academics studying public Douban lists and search results.
Why use this actor?
Douban is a high-signal source for Chinese media, entertainment, books, music, and community discussion. Manual copy-paste is slow, inconsistent, and hard to repeat. This actor gives you repeatable public extraction with normalized fields and Apify platform integrations.
Public data only
The actor is scoped to public pages. It does not log in, bypass paywalls, use cookies, or attempt to defeat captcha/security flows. If Douban returns a security verification page for a specific detail URL, the actor skips it and continues with other public sources.
Supported sources
https://www.douban.com/searchpublic search pages.https://movie.douban.com/top250Movie Top 250 pages.https://movie.douban.com/j/search_subjectspublic hot movie JSON endpoint.- Public Douban movie, book, music, review, and group-topic URLs supplied as start URLs.
Input modes
- Add Douban URLs in
startUrls. - Enter a
searchQueryand choosesection. - Select
topListssuch asmovie-top250ormovie-hot. - Set
maxItemsto control dataset size and cost.
Example input
{"topLists": ["movie-top250"],"maxItems": 25,"proxyConfiguration": { "useApifyProxy": false }}
Search example input
{"searchQuery": "科幻","section": "movie","topLists": [],"maxItems": 20}
URL example input
{"startUrls": [{ "url": "https://movie.douban.com/top250" }],"maxItems": 50}
Output data
Each dataset row is a normalized Douban record. Fields are populated when Douban exposes the value publicly.
| Field | Description |
|---|---|
url | Source record URL |
canonicalUrl | Canonical URL when available |
id | Douban numeric identifier when detected |
type | Subject, review, topic, profile, or page |
section | Movie, book, music, group, or all |
source | Input source such as movie-top250, movie-hot, search, or startUrl |
query | Search query for search records |
title | Public title |
originalTitle | Original title when visible |
description | Public description, quote, intro, or snippet |
rating | Numeric Douban rating when visible |
ratingCount | Public rating count when visible |
rank | List rank, such as Movie Top 250 rank |
year | Release/publication year when parsed |
genres | Public genres/tags |
directors | Directors or creators when visible |
authors | Authors when visible |
cast | Cast when visible |
region | Region/country when parsed |
language | Language when visible |
duration | Runtime when visible |
pages | Book page count when visible |
imageUrl | Main image/poster URL |
mediaUrls | List of media/image URLs |
authorName | Review/comment/topic author when visible |
date | Public date when visible |
upvoteCount | Upvotes/helpful count when visible |
commentCount | Comment count when visible |
scrapedAt | ISO timestamp of extraction |
Example output
{"url": "https://movie.douban.com/subject/1292052/","id": "1292052","type": "subject","section": "movie","source": "movie-top250","title": "肖申克的救赎","originalTitle": "The Shawshank Redemption","rating": 9.7,"rank": 1,"year": "1994","region": "美国","scrapedAt": "2026-06-20T00:00:00.000Z"}
How much does it cost to scrape Douban?
The actor uses pay-per-event pricing: a small start fee plus a per-record event for each saved Douban record. Keep maxItems low for trial runs, then scale once the output matches your workflow.
Tips for reliable Douban scraping
- Start with Movie Top 250 or search pages because they are publicly rendered.
- Keep
maxItemssmall while testing your workflow. - Use datacenter proxy first if you need proxying.
- Use residential proxy only when your workload is blocked and the economics still make sense.
- Avoid repeatedly requesting login-only or security-check pages.
Reviews and comments
Douban may expose some review, comment, and group-topic content publicly. The actor treats these as best-effort start URL records in v0.1. It does not log in or expand private content. Future versions can add deeper public review/comment pagination if it remains commercially reliable.
Integrations
Use Douban Scraper with:
- Google Sheets exports for analyst review.
- Apify datasets API for data pipelines.
- Webhooks to notify when a scheduled monitor finishes.
- LLM enrichment actors for translation, classification, sentiment, or entity extraction.
- BI tools that consume CSV, Excel, JSON, or NDJSON.
API usage with Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: process.env.APIFY_TOKEN });const run = await client.actor('automation-lab/douban-scraper').call({topLists: ['movie-top250'],maxItems: 25,});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items.slice(0, 3));
API usage with Python
from apify_client import ApifyClientimport osclient = ApifyClient(os.environ['APIFY_TOKEN'])run = client.actor('automation-lab/douban-scraper').call(run_input={'searchQuery': '科幻','section': 'movie','topLists': [],'maxItems': 20,})items = client.dataset(run['defaultDatasetId']).list_items().itemsprint(items[:3])
API usage with cURL
curl -X POST "https://api.apify.com/v2/acts/automation-lab~douban-scraper/runs?token=$APIFY_TOKEN" \-H 'Content-Type: application/json' \-d '{"topLists":["movie-top250"],"maxItems":25}'
MCP usage
Connect through Apify MCP using:
https://mcp.apify.com/?tools=automation-lab/douban-scraper
Add it to Claude Code:
$claude mcp add apify-douban "https://mcp.apify.com/?tools=automation-lab/douban-scraper"
Desktop MCP JSON configuration:
{"mcpServers": {"apify-douban": {"url": "https://mcp.apify.com/?tools=automation-lab/douban-scraper"}}}
Example prompts:
- "Run Douban Scraper for Movie Top 250 and summarize the highest-rated films."
- "Search Douban for 科幻 movies and group the results by rating."
- "Extract public Douban records and prepare a sentiment-analysis input table."
Scheduling
Schedule the actor daily, weekly, or monthly to monitor public Douban list or search changes. Use Apify webhooks to send the dataset to your warehouse or notification system.
Proxy settings
Proxy is optional. Start without proxy or with Apify datacenter proxy. Residential proxy can improve access for some workloads but costs more, so test with a small maxItems first.
Data quality notes
Douban page layouts vary by section and anti-bot state. The actor normalizes visible values and leaves unavailable fields empty. Public list/search extraction is more reliable than individual subject pages that trigger security checks.
FAQ
Can I scrape private Douban pages?
No. The actor is designed for public data only and skips login-only or security-check pages.
Does it support reviews and comments?
It supports public review, comment, and topic URLs on a best-effort basis. Deep login-only review expansion is intentionally excluded from v0.1.
Troubleshooting: why did I get fewer results?
You may have hit maxItems, supplied a URL that requires security verification, or selected a search query with few public results. Try Movie Top 250 or a broader query to validate your setup.
Troubleshooting: why are some fields empty?
Douban does not expose every field on every page type. For example, Top 250 cards expose ratings and ranks but not full star distributions. Empty optional fields mean the value was not visible publicly on that page.
Legality
This actor extracts publicly available information. You are responsible for using the data lawfully, respecting Douban terms, privacy rights, copyright rules, and applicable laws in your jurisdiction.
Related scrapers
- https://apify.com/automation-lab/bilibili-scraper
- https://apify.com/automation-lab/rednote-xiaohongshu-scraper
- https://apify.com/automation-lab/weibo-scraper
- https://apify.com/automation-lab/website-content-crawler
Changelog
0.1
Initial public-data Douban scraper with Top 250, hot movies, search, and start URL extraction.
Development notes
This actor is built with HTTP requests and Cheerio. It is intentionally lightweight and configured for 256 MB memory by default.
Field coverage roadmap
- Add deeper public review pagination if target behavior remains stable.
- Add public book and music list shortcuts.
- Add section-specific parsers for book pages, music pages, and group topics.
- Add optional translation/enrichment workflows through companion actors.
Support
If a public Douban page returns no data, include the run ID and the input URL when reporting the issue. That makes it possible to distinguish parser changes from Douban security responses.