Caixin Global Article Scraper avatar

Caixin Global Article Scraper

Pricing

from $20.00 / 1,000 results

Go to Apify Store
Caixin Global Article Scraper

Caixin Global Article Scraper

Extract Caixin Global (caixinglobal.com) articles - title, body, authors and metadata. HTTP-only. Mode `latest` scrapes the homepage for the newest article URLs.

Pricing

from $20.00 / 1,000 results

Rating

0.0

(0)

Developer

Xtractoo

Xtractoo

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Categories

Share

Extract article text, headlines, authors, and metadata from any caixinglobal.com URL. Caixin Global is one of China's most respected financial-news publishers and the English-language arm of Caixin Media (Beijing).

Why Use This Actor?

  • China business intel — Caixin is independently editorial and often breaks Chinese SOE / regulatory stories before state media.
  • Digest Hub — Caixin auto-generates a short AI summary of each piece (the "Digest Hub" section), great for fast scanning.
  • No anti-bot — public articles load via standard HTTPS, no Cloudflare or DataDome layer to bypass.

How It Works

This actor uses only HTTP requests — no browser, no Selenium, no Playwright.

Input

{
"url": "https://www.caixinglobal.com/2026-05-14/example-article-102444052.html",
"urls": [
"https://www.caixinglobal.com/2026-05-13/article-one.html"
],
"mode": "article",
"limit": 10
}

Output

{
"url": "https://www.caixinglobal.com/2026-05-14/in-depth-beijings-clampdown-on-overseas-postings-hits-top-state-owned-insurer-102444052.html",
"source": "Caixin Global",
"title": "In Depth: Beijing's Clampdown on Overseas Postings Hits Top State-Owned Insurer",
"description": "China Taiping reshuffles overseas staff back to mainland as part of SOE financial-tightening drive.",
"content": "1. China Taiping Insurance Group Ltd. is reshuffling overseas employees back to mainland China as part of state-owned financial institutions' efforts to tighten overseas staff management...",
"image": "https://img.caixin.com/2026-05-14/177875269131978_560_373.jpg",
"language": "en",
"word_count": 452,
"published_date": "",
"modified_date": "",
"authors": ["Ding Feng"],
"categories": "",
"tags": ""
}

Fetch Latest News

Set mode to "latest" to fetch the newest article URLs and titles from Caixin Global's homepage. Caixin Global doesn't expose a public RSS, so this scrapes the homepage and collects URLs matching the date-slug pattern.

Input:

{
"mode": "latest",
"limit": 10
}

Output — array of objects:

[
{
"url": "https://www.caixinglobal.com/2026-05-14/example-article-headline-102444052.html",
"title": "In Depth: Beijing's Clampdown on Overseas Postings Hits Top State-Owned Insurer",
"source": "Caixin Global"
}
]

Source: https://www.caixinglobal.com/ (homepage scraping — no RSS available)

Cron Schedule: Auto-Fetch Newest Articles

Combine mode: "latest" and mode: "article" to keep a fresh feed running on autopilot:

  1. Schedule a recurring run of this Actor with {"mode": "latest", "limit": 20} via Apify Schedules (UI ▸ Schedules ▸ Create new). A cron expression like */30 * * * * runs it every 30 minutes.
  2. Webhook the dataset of the latest run into another Actor run with mode: "article" and the new URLs as input — Apify integrations let you chain runs via the "Actor finished" webhook without any glue code.
  3. The article-mode run extracts the full body, image, authors, and metadata for each URL and appends to your master dataset.

Common cron expressions:

FrequencyCron
Every 15 minutes*/15 * * * *
Hourly0 * * * *
Every 6 hours0 */6 * * *
Daily at 06:00 UTC0 6 * * *

Notes

  • Caixin Global ships a "Digest Hub" AI summary on most articles — this is what populates the content field for non-subscribers. Full article body is paywall-gated server-side.
  • URL pattern: /<YYYY-MM-DD>/<slug>-<id>.html.