Douban Scraper avatar

Douban Scraper

Pricing

Pay per event

Go to Apify Store
Douban Scraper

Douban Scraper

Scrape public Douban movie, book, music, search, top-list, review, and comment data for China media intelligence workflows.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Stas Persiianenko

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Categories

Share

Scrape public Douban movie, book, music, search, top-list, review, comment, and group-topic pages for China media intelligence workflows. This Apify Actor focuses on publicly accessible Douban data and does not require login, cookies, or private accounts.

What does Douban Scraper do?

Douban Scraper extracts structured records from public Douban pages. It can collect Movie Top 250 entries, hot movie JSON records, Douban search results, and metadata from public start URLs.

Use it to turn Douban pages into clean JSON, CSV, Excel, or API-ready datasets.

Who is it for?

  • 🧭 China market researchers tracking audience sentiment and cultural trends.
  • 🎬 Film, TV, music, and publishing analysts comparing ratings and rankings.
  • 🤖 LLM and NLP teams building Chinese-language media corpora from public pages.
  • 📊 Social listening teams monitoring public reviews, comments, and group topics.
  • 🧪 Data journalists and academics studying public Douban lists and search results.

Why use this actor?

Douban is a high-signal source for Chinese media, entertainment, books, music, and community discussion. Manual copy-paste is slow, inconsistent, and hard to repeat. This actor gives you repeatable public extraction with normalized fields and Apify platform integrations.

Public data only

The actor is scoped to public pages. It does not log in, bypass paywalls, use cookies, or attempt to defeat captcha/security flows. If Douban returns a security verification page for a specific detail URL, the actor skips it and continues with other public sources.

Supported sources

  • https://www.douban.com/search public search pages.
  • https://movie.douban.com/top250 Movie Top 250 pages.
  • https://movie.douban.com/j/search_subjects public hot movie JSON endpoint.
  • Public Douban movie, book, music, review, and group-topic URLs supplied as start URLs.

Input modes

  1. Add Douban URLs in startUrls.
  2. Enter a searchQuery and choose section.
  3. Select topLists such as movie-top250 or movie-hot.
  4. Set maxItems to control dataset size and cost.

Example input

{
"topLists": ["movie-top250"],
"maxItems": 25,
"proxyConfiguration": { "useApifyProxy": false }
}

Search example input

{
"searchQuery": "科幻",
"section": "movie",
"topLists": [],
"maxItems": 20
}

URL example input

{
"startUrls": [
{ "url": "https://movie.douban.com/top250" }
],
"maxItems": 50
}

Output data

Each dataset row is a normalized Douban record. Fields are populated when Douban exposes the value publicly.

FieldDescription
urlSource record URL
canonicalUrlCanonical URL when available
idDouban numeric identifier when detected
typeSubject, review, topic, profile, or page
sectionMovie, book, music, group, or all
sourceInput source such as movie-top250, movie-hot, search, or startUrl
querySearch query for search records
titlePublic title
originalTitleOriginal title when visible
descriptionPublic description, quote, intro, or snippet
ratingNumeric Douban rating when visible
ratingCountPublic rating count when visible
rankList rank, such as Movie Top 250 rank
yearRelease/publication year when parsed
genresPublic genres/tags
directorsDirectors or creators when visible
authorsAuthors when visible
castCast when visible
regionRegion/country when parsed
languageLanguage when visible
durationRuntime when visible
pagesBook page count when visible
imageUrlMain image/poster URL
mediaUrlsList of media/image URLs
authorNameReview/comment/topic author when visible
datePublic date when visible
upvoteCountUpvotes/helpful count when visible
commentCountComment count when visible
scrapedAtISO timestamp of extraction

Example output

{
"url": "https://movie.douban.com/subject/1292052/",
"id": "1292052",
"type": "subject",
"section": "movie",
"source": "movie-top250",
"title": "肖申克的救赎",
"originalTitle": "The Shawshank Redemption",
"rating": 9.7,
"rank": 1,
"year": "1994",
"region": "美国",
"scrapedAt": "2026-06-20T00:00:00.000Z"
}

How much does it cost to scrape Douban?

The actor uses pay-per-event pricing: a small start fee plus a per-record event for each saved Douban record. Keep maxItems low for trial runs, then scale once the output matches your workflow.

Tips for reliable Douban scraping

  • Start with Movie Top 250 or search pages because they are publicly rendered.
  • Keep maxItems small while testing your workflow.
  • Use datacenter proxy first if you need proxying.
  • Use residential proxy only when your workload is blocked and the economics still make sense.
  • Avoid repeatedly requesting login-only or security-check pages.

Reviews and comments

Douban may expose some review, comment, and group-topic content publicly. The actor treats these as best-effort start URL records in v0.1. It does not log in or expand private content. Future versions can add deeper public review/comment pagination if it remains commercially reliable.

Integrations

Use Douban Scraper with:

  • Google Sheets exports for analyst review.
  • Apify datasets API for data pipelines.
  • Webhooks to notify when a scheduled monitor finishes.
  • LLM enrichment actors for translation, classification, sentiment, or entity extraction.
  • BI tools that consume CSV, Excel, JSON, or NDJSON.

API usage with Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: process.env.APIFY_TOKEN });
const run = await client.actor('automation-lab/douban-scraper').call({
topLists: ['movie-top250'],
maxItems: 25,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items.slice(0, 3));

API usage with Python

from apify_client import ApifyClient
import os
client = ApifyClient(os.environ['APIFY_TOKEN'])
run = client.actor('automation-lab/douban-scraper').call(run_input={
'searchQuery': '科幻',
'section': 'movie',
'topLists': [],
'maxItems': 20,
})
items = client.dataset(run['defaultDatasetId']).list_items().items
print(items[:3])

API usage with cURL

curl -X POST "https://api.apify.com/v2/acts/automation-lab~douban-scraper/runs?token=$APIFY_TOKEN" \
-H 'Content-Type: application/json' \
-d '{"topLists":["movie-top250"],"maxItems":25}'

MCP usage

Connect through Apify MCP using:

https://mcp.apify.com/?tools=automation-lab/douban-scraper

Add it to Claude Code:

$claude mcp add apify-douban "https://mcp.apify.com/?tools=automation-lab/douban-scraper"

Desktop MCP JSON configuration:

{
"mcpServers": {
"apify-douban": {
"url": "https://mcp.apify.com/?tools=automation-lab/douban-scraper"
}
}
}

Example prompts:

  • "Run Douban Scraper for Movie Top 250 and summarize the highest-rated films."
  • "Search Douban for 科幻 movies and group the results by rating."
  • "Extract public Douban records and prepare a sentiment-analysis input table."

Scheduling

Schedule the actor daily, weekly, or monthly to monitor public Douban list or search changes. Use Apify webhooks to send the dataset to your warehouse or notification system.

Proxy settings

Proxy is optional. Start without proxy or with Apify datacenter proxy. Residential proxy can improve access for some workloads but costs more, so test with a small maxItems first.

Data quality notes

Douban page layouts vary by section and anti-bot state. The actor normalizes visible values and leaves unavailable fields empty. Public list/search extraction is more reliable than individual subject pages that trigger security checks.

FAQ

Can I scrape private Douban pages?

No. The actor is designed for public data only and skips login-only or security-check pages.

Does it support reviews and comments?

It supports public review, comment, and topic URLs on a best-effort basis. Deep login-only review expansion is intentionally excluded from v0.1.

Troubleshooting: why did I get fewer results?

You may have hit maxItems, supplied a URL that requires security verification, or selected a search query with few public results. Try Movie Top 250 or a broader query to validate your setup.

Troubleshooting: why are some fields empty?

Douban does not expose every field on every page type. For example, Top 250 cards expose ratings and ranks but not full star distributions. Empty optional fields mean the value was not visible publicly on that page.

Legality

This actor extracts publicly available information. You are responsible for using the data lawfully, respecting Douban terms, privacy rights, copyright rules, and applicable laws in your jurisdiction.

Changelog

0.1

Initial public-data Douban scraper with Top 250, hot movies, search, and start URL extraction.

Development notes

This actor is built with HTTP requests and Cheerio. It is intentionally lightweight and configured for 256 MB memory by default.

Field coverage roadmap

  • Add deeper public review pagination if target behavior remains stable.
  • Add public book and music list shortcuts.
  • Add section-specific parsers for book pages, music pages, and group topics.
  • Add optional translation/enrichment workflows through companion actors.

Support

If a public Douban page returns no data, include the run ID and the input URL when reporting the issue. That makes it possible to distinguish parser changes from Douban security responses.