GDELT News Data Enrichment Pipeline avatar
GDELT News Data Enrichment Pipeline
Under maintenance

Pricing

Pay per event

Go to Apify Store
GDELT News Data Enrichment Pipeline

GDELT News Data Enrichment Pipeline

Under maintenance

Developed by

bySeitz AI & Automation

bySeitz AI & Automation

Maintained by Community

This actor is the central intelligence hub for a multi-pipeline news aggregation system. Its primary role is to fetch, unify, cleanse, and analyze raw news data from multiple Apify news pipeline actors, preparing a structured dataset of topical trends for downstream AI services.

0.0 (0)

Pricing

Pay per event

1

2

2

Last modified

3 days ago

📰 GDELT Data Enrichment Pipeline

This actor is a specialized data analysis tool designed to enrich raw article metadata retrieved from the GDELT 2.0 Global Knowledge Graph API.

It takes broad GDELT queries and converts the resulting article links into structured, actionable intelligence by performing a secondary DuckDuckGo News Search and advanced LLM (OpenAI) analysis on the original article's topic.

This pipeline is crucial for research, geopolitical monitoring, and threat intelligence gathering, as GDELT provides vast, global coverage that is then refined and analyzed for risk.


How to Use

The primary function of this actor is to transform raw metadata from the GDELT API into a refined dataset suitable for immediate analysis or consumption by downstream systems.

  1. As a Standalone Tool: Run this actor to generate high-quality, structured news intelligence. Download the resulting dataset for your own geopolitical or security analysis, reports, or to feed into custom dashboards.
  2. As a Data Source for Downstream Systems: The resulting structured data can be easily consumed by other tools (e.g., our Content Blueprint AI actor) to generate reports, threat briefings, and risk summaries based on the enriched GDELT output.

Features

  • GDELT Data Retrieval: Fetches a defined maximum number of articles from GDELT based on a user-defined query, time range, and language filter.
  • DuckDuckGo Grounding (Priority): For every GDELT article, the pipeline executes a targeted DuckDuckGo News Search to retrieve fresh, corroborating snippets. This ensures the analysis is based on the most current and validated context, overriding the unverified nature of raw GDELT records.
  • LLM Synthesis and Translation: An LLM (OpenAI) synthesizes the DuckDuckGo snippets into a concise, coherent summary, automatically translating the summary into English for universal consumption, even if the source snippets were in a foreign language.
  • Advanced Risk Analysis: A second LLM call analyzes the English summary to categorize the topic (e.g., Malware/Ransomware), assign a risk level (High Risk, Medium Risk), and extract key entities.
  • Date Enrichment: Uses the most consistent date found in the DuckDuckGo snippets as a robust publication date fallback if the original GDELT date is missing or inconsistent.
  • Cost-Saving Test Mode: Includes a test mode to run the full workflow with dummy data, allowing for development and testing without incurring API costs.

Setup and Configuration

Before running the actor, you only need to provide an API key for the LLM service.

  1. OpenAI API Key:
    • You will need an API key from your OpenAI account.

Add Keys to Apify Secrets

For security, add this key as a secret environment variable in your Apify Actor settings:

  • OPENAI_API_KEY: Your OpenAI API Key.

Cost of Usage 💸

Important Note: The costs listed below are for this actor only.

Costs for This Actor

  1. Apify Platform Usage: Standard platform costs for running the actor.
  2. GDELT API: The GDELT API is generally free. The actor makes one call to GDELT to retrieve the list of articles.
  3. DuckDuckGo Search: This service is free. The actor performs one search query for every article it processes.
  4. OpenAI API: This is the primary cost. The actor makes two LLM calls for every article: one to summarize/translate the DuckDuckGo Search snippets and another to perform the final structured analysis (sentiment, category, etc.). The total consumption is tracked via the llm-analysis-tokens-used event.

Input

FieldTypeDefaultDescription
queryString\"Global Economy\" OR theme:ECON_BUSINESSThe mandatory GDELT search query (e.g., '\"stock market\" OR theme:CYBER_THREAT').
max_records_limitInteger100The maximum number of GDELT articles to retrieve (Max 250).
timespan_offsetString3 daysA relative time span (e.g., '1week') or an absolute date range.
sort_byStringHybridRelThe field to sort GDELT results by (e.g., DateDesc, ToneAsc).
source_langStringnullFilter GDELT results to articles originally published in a specific language (e.g., 'english' or 'EN').
regionStringwt-wtRegion for DuckDuckGo search results (e.g., 'us-en' for US, 'wt-wt' for World).
timeLimitStringwTime limit for DuckDuckGo search results ('d' for day, 'w' for week).
runTestModeBooleanfalseBypasses all external API calls for zero-cost testing. Do not enable in production.

Output

The actor saves its structured results in the dataset. Each item is a structured JSON object designed for easy consumption:

FieldTypeDescription
sourceStringThe name of the news source (derived from the URL).
titleStringThe original title of the GDELT article.
urlStringThe URL of the original article.
publishedStringThe publication date/time in ISO 8601 format (from GDELT or DuckDuckGo).
summaryStringThe English, AI-generated summary of the article content.
sentimentStringThe AI-analyzed risk/impact (e.g., High Risk, Low Risk/Informational).
categoryStringThe AI-assigned category (e.g., Vulnerability/CVE, Geopolitical).
key_entitiesArray of StringsA list of key entities (companies, groups, vulnerabilities, countries).