CUC China Media University — Dance & Performing Arts Scraper avatar

CUC China Media University — Dance & Performing Arts Scraper

Pricing

Pay per event

Go to Apify Store
CUC China Media University — Dance & Performing Arts Scraper

CUC China Media University — Dance & Performing Arts Scraper

Scrapes faculty rosters, admissions announcements, news, and program pages from Communication University of China (中国传媒大学 / CUC). Covers the dance and performing-arts pipeline that feeds CCTV, Mango TV, and provincial state broadcasters.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

5 days ago

Last modified

Categories

Share

Scrapes faculty rosters, admissions announcements, program pages, and news from Communication University of China (中国传媒大学 / CUC). The school that trains CCTV anchors, Mango TV hosts, and the performers you see at the CCTV Spring Festival Gala.

CUC runs one of China's top performing-arts programs. Its alumni pipeline feeds state broadcasters, provincial TV networks, and the national competition circuit. This actor pulls that pipeline data — structured, paginated, and in UTF-8 — so you don't have to navigate a WebPlus CMS manually.

What It Scrapes

Four configurable section categories, all from www.cuc.edu.cn:

CategoryContent
admissions招生就业 — enrollment notices, admission policies, exam requirements
facultyLeadership rosters, special collections, departmental faculty pages
programsAcademic affairs notices, curriculum docs, departmental announcements
newsMain news feed, school of arts culture network, academic exchanges

Articles follow a predictable URL pattern (/YYYY/MMDD/c<channel>a<id>/page.htm). Pagination uses numbered .psp pages. No JavaScript rendering required.

Output Fields

FieldTypeDescription
page_urlStringCanonical URL of the scraped page
titleStringArticle or page title (Chinese characters preserved)
title_zhStringChinese title — identical to title for CUC pages
categoryStringSection: admissions, faculty, programs, or news
publish_dateStringPublication date as shown on the page (e.g. 2024-05-10)
body_htmlStringFull article HTML including embedded content references
body_textStringPlain-text article body
departmentStringChannel code identifying the originating department
attachmentsStringPDF/DOC attachment URLs, pipe-separated (admissions docs, curriculum PDFs)
source_urlStringOriginating article URL
scrapedAtStringISO-8601 timestamp of the scrape

Faculty name-card pages have body_html and body_text empty by design — the page contains only a name and date. The title and department fields are always populated.

Input Parameters

ParameterTypeDefaultDescription
maxItemsInteger10Maximum number of article pages to scrape (0 = unlimited)
categoriesArrayall fourWhich sections to crawl: admissions, faculty, programs, news

Run with maxItems: 0 and all four categories for a full archive crawl. The main news channel alone has hundreds of pages going back several years.

How It Works

The actor uses a hierarchical crawl. It seeds from the section entry points (/zsjy/list.htm, /9996/list.htm, etc.), follows pagination forward, and enqueues every article URL it finds. Article pages get a full extraction pass — title, date, body content, and any PDF attachments.

No proxy required. CUC's servers respond cleanly to datacenter IPs. No Cloudflare. No anti-bot. Concurrency is kept at 5 to stay polite with a university web server.

Use Cases

  • Chinese-language NLP training corpora (performing arts domain)
  • Talent-pipeline research tracking which departments feed CCTV and Mango TV
  • Competitive analysis for Chinese broadcasting education programs
  • Admissions document archives for research on Chinese university policy

Notes

The site CMS (WebPlus) uses numeric channel codes for some departments. The department field preserves the raw channel code. Map to human-readable names using the CUC department directory as needed.

CUC's admissions section includes PDFs of curriculum plans and judging panel documents for performance programs. These appear as pipe-separated URLs in the attachments field.


Part of the OrbTop Chinese media education dataset — companion to the BDA Beijing Dance Academy Scraper.