Caixin Global Article Scraper
Pricing
from $20.00 / 1,000 results
Caixin Global Article Scraper
Extract Caixin Global (caixinglobal.com) articles - title, body, authors and metadata. HTTP-only. Mode `latest` scrapes the homepage for the newest article URLs.
Extract article text, headlines, authors, and metadata from any caixinglobal.com URL. Caixin Global is one of China's most respected financial-news publishers and the English-language arm of Caixin Media (Beijing).
Why Use This Actor?
- China business intel — Caixin is independently editorial and often breaks Chinese SOE / regulatory stories before state media.
- Digest Hub — Caixin auto-generates a short AI summary of each piece (the "Digest Hub" section), great for fast scanning.
- No anti-bot — public articles load via standard HTTPS, no Cloudflare or DataDome layer to bypass.
How It Works
This actor uses only HTTP requests — no browser, no Selenium, no Playwright.
Input
{"url": "https://www.caixinglobal.com/2026-05-14/example-article-102444052.html","urls": ["https://www.caixinglobal.com/2026-05-13/article-one.html"],"mode": "article","limit": 10}
Output
{"url": "https://www.caixinglobal.com/2026-05-14/in-depth-beijings-clampdown-on-overseas-postings-hits-top-state-owned-insurer-102444052.html","source": "Caixin Global","title": "In Depth: Beijing's Clampdown on Overseas Postings Hits Top State-Owned Insurer","description": "China Taiping reshuffles overseas staff back to mainland as part of SOE financial-tightening drive.","content": "1. China Taiping Insurance Group Ltd. is reshuffling overseas employees back to mainland China as part of state-owned financial institutions' efforts to tighten overseas staff management...","image": "https://img.caixin.com/2026-05-14/177875269131978_560_373.jpg","language": "en","word_count": 452,"published_date": "","modified_date": "","authors": ["Ding Feng"],"categories": "","tags": ""}
Fetch Latest News
Set mode to "latest" to fetch the newest article URLs and titles from Caixin Global's homepage. Caixin Global doesn't expose a public RSS, so this scrapes the homepage and collects URLs matching the date-slug pattern.
Input:
{"mode": "latest","limit": 10}
Output — array of objects:
[{"url": "https://www.caixinglobal.com/2026-05-14/example-article-headline-102444052.html","title": "In Depth: Beijing's Clampdown on Overseas Postings Hits Top State-Owned Insurer","source": "Caixin Global"}]
Source: https://www.caixinglobal.com/ (homepage scraping — no RSS available)
Cron Schedule: Auto-Fetch Newest Articles
Combine mode: "latest" and mode: "article" to keep a fresh feed running on autopilot:
- Schedule a recurring run of this Actor with
{"mode": "latest", "limit": 20}via Apify Schedules (UI ▸ Schedules ▸ Create new). A cron expression like*/30 * * * *runs it every 30 minutes. - Webhook the dataset of the latest run into another Actor run with
mode: "article"and the new URLs as input — Apify integrations let you chain runs via the "Actor finished" webhook without any glue code. - The article-mode run extracts the full body, image, authors, and metadata for each URL and appends to your master dataset.
Common cron expressions:
| Frequency | Cron |
|---|---|
| Every 15 minutes | */15 * * * * |
| Hourly | 0 * * * * |
| Every 6 hours | 0 */6 * * * |
| Daily at 06:00 UTC | 0 6 * * * |
Notes
- Caixin Global ships a "Digest Hub" AI summary on most articles — this is what populates the
contentfield for non-subscribers. Full article body is paywall-gated server-side. - URL pattern:
/<YYYY-MM-DD>/<slug>-<id>.html.