CUC China Media University — Dance & Performing Arts Scraper
Pricing
Pay per event
CUC China Media University — Dance & Performing Arts Scraper
Scrapes faculty rosters, admissions announcements, news, and program pages from Communication University of China (中国传媒大学 / CUC). Covers the dance and performing-arts pipeline that feeds CCTV, Mango TV, and provincial state broadcasters.
Pricing
Pay per event
Rating
0.0
(0)
Developer
BowTiedRaccoon
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
5 days ago
Last modified
Share
Scrapes faculty rosters, admissions announcements, program pages, and news from Communication University of China (中国传媒大学 / CUC). The school that trains CCTV anchors, Mango TV hosts, and the performers you see at the CCTV Spring Festival Gala.
CUC runs one of China's top performing-arts programs. Its alumni pipeline feeds state broadcasters, provincial TV networks, and the national competition circuit. This actor pulls that pipeline data — structured, paginated, and in UTF-8 — so you don't have to navigate a WebPlus CMS manually.
What It Scrapes
Four configurable section categories, all from www.cuc.edu.cn:
| Category | Content |
|---|---|
admissions | 招生就业 — enrollment notices, admission policies, exam requirements |
faculty | Leadership rosters, special collections, departmental faculty pages |
programs | Academic affairs notices, curriculum docs, departmental announcements |
news | Main news feed, school of arts culture network, academic exchanges |
Articles follow a predictable URL pattern (/YYYY/MMDD/c<channel>a<id>/page.htm). Pagination uses numbered .psp pages. No JavaScript rendering required.
Output Fields
| Field | Type | Description |
|---|---|---|
page_url | String | Canonical URL of the scraped page |
title | String | Article or page title (Chinese characters preserved) |
title_zh | String | Chinese title — identical to title for CUC pages |
category | String | Section: admissions, faculty, programs, or news |
publish_date | String | Publication date as shown on the page (e.g. 2024-05-10) |
body_html | String | Full article HTML including embedded content references |
body_text | String | Plain-text article body |
department | String | Channel code identifying the originating department |
attachments | String | PDF/DOC attachment URLs, pipe-separated (admissions docs, curriculum PDFs) |
source_url | String | Originating article URL |
scrapedAt | String | ISO-8601 timestamp of the scrape |
Faculty name-card pages have body_html and body_text empty by design — the page contains only a name and date. The title and department fields are always populated.
Input Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
maxItems | Integer | 10 | Maximum number of article pages to scrape (0 = unlimited) |
categories | Array | all four | Which sections to crawl: admissions, faculty, programs, news |
Run with maxItems: 0 and all four categories for a full archive crawl. The main news channel alone has hundreds of pages going back several years.
How It Works
The actor uses a hierarchical crawl. It seeds from the section entry points (/zsjy/list.htm, /9996/list.htm, etc.), follows pagination forward, and enqueues every article URL it finds. Article pages get a full extraction pass — title, date, body content, and any PDF attachments.
No proxy required. CUC's servers respond cleanly to datacenter IPs. No Cloudflare. No anti-bot. Concurrency is kept at 5 to stay polite with a university web server.
Use Cases
- Chinese-language NLP training corpora (performing arts domain)
- Talent-pipeline research tracking which departments feed CCTV and Mango TV
- Competitive analysis for Chinese broadcasting education programs
- Admissions document archives for research on Chinese university policy
Notes
The site CMS (WebPlus) uses numeric channel codes for some departments. The department field preserves the raw channel code. Map to human-readable names using the CUC department directory as needed.
CUC's admissions section includes PDFs of curriculum plans and judging panel documents for performance programs. These appear as pipe-separated URLs in the attachments field.
Part of the OrbTop Chinese media education dataset — companion to the BDA Beijing Dance Academy Scraper.