GDELT Worldwide News Scraper avatar

GDELT Worldwide News Scraper

Pricing

$2.00 / 1,000 article returneds

Go to Apify Store
GDELT Worldwide News Scraper

GDELT Worldwide News Scraper

Search worldwide news with the public GDELT 2.0 DOC API — no API key, no login. Filter by timespan, source country, and language; sort by newest or relevance. Returns clean articles with title, URL, domain, country, language, publish date, and image.

Pricing

$2.00 / 1,000 article returneds

Rating

0.0

(0)

Developer

Dami's Studio

Dami's Studio

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a day ago

Last modified

Share

Search worldwide news through the GDELT 2.0 DOC API — a public, no-key, no-login JSON endpoint that indexes online news from across the planet in 65+ languages. No proxy or anti-bot handling needed.

What it does

Given a search query, the actor calls the GDELT DOC API in ArtList mode and returns clean, structured articles. You can filter by recency, source country, and source language, and sort by newest, oldest, or relevance.

Input

FieldTypeDefaultNotes
querystring— (required)Keywords. Quote phrases: "climate change". Very short/common single words may be rejected by GDELT.
maxItemsinteger100Capped at 250 — GDELT returns at most 250 articles per request.
sortstringDateDescDateDesc (newest), DateAsc (oldest), HybridRel (relevance).
timespanstringRecency window, e.g. 1d, 3d, 1w, 1m, 3m. GDELT covers ~the last 3 months.
sourceCountrystringAppended as sourcecountry:{code} (e.g. US, UK, FR).
sourceLangstringAppended as sourcelang:{code} (e.g. english, french).
proxyConfigurationobjectOptional; not needed (public API).

Output

Each successful row:

{
"ok": true,
"title": "…",
"url": "https://…",
"domain": "example.com",
"sourceCountry": "United States",
"language": "English",
"publishedAt": "2026-06-11T12:00:00.000Z",
"socialImage": "https://…"
}

Results are de-duplicated by URL. Each ok:true article is billed one article charge unit. Diagnostic rows (ok:false) and empty/blocked runs are never charged.

Nullable fields: GDELT does not always populate every field. Any of title, url, domain, sourceCountry, language, publishedAt, and socialImage can be null for a given article (e.g. socialImage is often missing, and publishedAt is null when GDELT's seendate is unparseable). Rows with neither a url nor a title are dropped before charging.

Diagnostics

The actor never fails silently. Instead it writes a single diagnostic row (ok:false) with an errorCode and never charges for it:

  • BAD_INPUT — GDELT rejected the query (e.g. "query too short"). Quote phrases and avoid overly short/common terms.
  • NO_RESULTS — the query was valid but matched nothing. Broaden it or widen the timespan.
  • RATE_LIMITED / SERVER_ERROR / NETWORK — transient issues; the actor retried with backoff first.

Notes / quirks

  • GDELT requires the query to be URL-encoded and phrases to be quoted — the actor handles both.
  • On a malformed query GDELT may return a text/plain error string (sometimes with HTTP 200) or an empty body instead of JSON. The actor guards JSON.parse and surfaces a clear BAD_INPUT diagnostic.
  • GDELT's index covers roughly the last 3 months of news.
  • The actor rotates a real browser User-Agent per request attempt for retry resilience, and supports an optional proxy (proxyConfiguration). Neither is required — GDELT is a public no-key API with no anti-bot — so leave the proxy unset unless you hit IP-level rate limits.