Pricing

from $20.00 / 1,000 results

Caixin Global Article Scraper

Extract Caixin Global (caixinglobal.com) articles - title, body, authors and metadata. HTTP-only. Mode `latest` scrapes the homepage for the newest article URLs.

Pricing

from $20.00 / 1,000 results

Rating

0.0

(0)

Developer

Farhan Febrian Nauval

Actor stats

Bookmarked

Total users

Monthly active users

20 days ago

Last modified

Why Use This Actor?

China business intel — Caixin is independently editorial and often breaks Chinese SOE / regulatory stories before state media.
Digest Hub — Caixin auto-generates a short AI summary of each piece (the "Digest Hub" section), great for fast scanning.
No anti-bot — public articles load via standard HTTPS, no Cloudflare or DataDome layer to bypass.

How It Works

This actor uses only HTTP requests — no browser, no Selenium, no Playwright.

Input

{
  "url": "https://www.caixinglobal.com/2026-05-14/example-article-102444052.html",
  "urls": [
    "https://www.caixinglobal.com/2026-05-13/article-one.html"
  ],
  "mode": "article",
  "limit": 10
}

Output

{
  "url": "https://www.caixinglobal.com/2026-05-14/in-depth-beijings-clampdown-on-overseas-postings-hits-top-state-owned-insurer-102444052.html",
  "source": "Caixin Global",
  "title": "In Depth: Beijing's Clampdown on Overseas Postings Hits Top State-Owned Insurer",
  "description": "China Taiping reshuffles overseas staff back to mainland as part of SOE financial-tightening drive.",
  "content": "1. China Taiping Insurance Group Ltd. is reshuffling overseas employees back to mainland China as part of state-owned financial institutions' efforts to tighten overseas staff management...",
  "image": "https://img.caixin.com/2026-05-14/177875269131978_560_373.jpg",
  "language": "en",
  "word_count": 452,
  "published_date": "",
  "modified_date": "",
  "authors": ["Ding Feng"],
  "categories": "",
  "tags": ""
}

Fetch Latest News

Set mode to "latest" to fetch the newest article URLs and titles from Caixin Global's homepage. Caixin Global doesn't expose a public RSS, so this scrapes the homepage and collects URLs matching the date-slug pattern.

Input:

{
  "mode": "latest",
  "limit": 10
}

Output — array of objects:

[
  {
    "url": "https://www.caixinglobal.com/2026-05-14/example-article-headline-102444052.html",
    "title": "In Depth: Beijing's Clampdown on Overseas Postings Hits Top State-Owned Insurer",
    "source": "Caixin Global"
  }
]

Source: https://www.caixinglobal.com/ (homepage scraping — no RSS available)

Cron Schedule: Auto-Fetch Newest Articles

Combine mode: "latest" and mode: "article" to keep a fresh feed running on autopilot:

Schedule a recurring run of this Actor with {"mode": "latest", "limit": 20} via Apify Schedules (UI ▸ Schedules ▸ Create new). A cron expression like */30 * * * * runs it every 30 minutes.
Webhook the dataset of the latest run into another Actor run with mode: "article" and the new URLs as input — Apify integrations let you chain runs via the "Actor finished" webhook without any glue code.
The article-mode run extracts the full body, image, authors, and metadata for each URL and appends to your master dataset.

Common cron expressions:

Frequency	Cron
Every 15 minutes	`/15 * * *`
Hourly	`0 * * * *`
Every 6 hours	`0 /6 * *`
Daily at 06:00 UTC	`0 6 * * *`

Notes

Caixin Global ships a "Digest Hub" AI summary on most articles — this is what populates the content field for non-subscribers. Full article body is paywall-gated server-side.
URL pattern: /<YYYY-MM-DD>/<slug>-<id>.html.

Barron's Article Scraper

xtracto/barrons-scraper

Extract Barron's articles (barrons.com) - title, body, authors and metadata. Fast, HTTP-only and no cookies required. Mode `latest` scrapes the homepage for the newest article URLs.

Farhan Febrian Nauval

5.0

News Article Scraper for Feeding LLM

proscraper/newsarticlescraper

Scrape news articles metadata to feed into LLM models. Returns article body, published date, article title, author etc.

Owais Nazir

184

Bbc Article Scraper

xtracto/bbc-scraper

Extract full article text, headline, authors, and publication date from any bbc.com URL. Supports `mode: latest` to fetch newest BBC headlines via RSS. No browser needed - HTTP-only, fast and lightweight.

Farhan Febrian Nauval

Data Indonesia Article Scraper

xtracto/dataindonesia-scraper

Extract full article text, authors, dates, and metadata from dataindonesia.id URLs. No browser needed - fast HTTP-only extraction via Next.js data.

Farhan Febrian Nauval

Public Article Intelligence & Citation Extractor

jacksu/public-article-intelligence-agent

Extract clean article text, metadata, summaries, citations, diagnostics, and change signals from public article URLs.

jack su

Cnbc Article Scraper

xtracto/cnbc-scraper

Scrape full article content, title, authors, and metadata from cnbc.com. Supports `mode: latest` for live CNBC headline feed. HTTP-only, no browser

Farhan Febrian Nauval

investors.com Business Daily Article Scraper

xtracto/investors-scraper

Extract article metadata and visible intro content from investors.com (IBD). Full articles contents, No browser needed - HTTP-only.

Farhan Febrian Nauval

GDELT Global News Article Scraper

chrisp1211/gdelt-news-scraper-max

Monitor global news across 100+ languages with GDELT. Search articles by keyword, language, country and time window. Returns title, URL, domain, source country and date. No API key. Pay per article; empty runs free.

Christian Pichichero

Advanced News Scraper

dorcy/advanced-news-scraper

Extract the latest news articles with custom search queries, providing all the information, including article titles, sources, publication dates, full article text, and an AI-generated summary.