WHO Health Intelligence Scraper avatar
WHO Health Intelligence Scraper

Pricing

from $0.01 / 1,000 results

Go to Apify Store
WHO Health Intelligence Scraper

WHO Health Intelligence Scraper

Aggregates WHO health data from multiple APIs: Publications (18K+ documents with PDF text extraction), GHO Statistics, and ClinicalTrials.gov. Features NLP location detection and alert keyword matching.

Pricing

from $0.01 / 1,000 results

Rating

5.0

(11)

Developer

The_Rook

The_Rook

Maintained by Community

Actor stats

10

Bookmarked

12

Total users

8

Monthly active users

7 days ago

Last modified

Share

πŸ₯ WHO Health Intelligence Gateway

Apify Actor

Automation Enabled

AI Ready

Publications

Indicators

Trials

All-in-one intelligence tool for Global Health Monitoring

This Actor unifies the three pillars of health data into a single, automated feed. It does the work of a data analyst in minutes:

  1. πŸ“Š The Past: Fetches official statistics (GHO API).
  2. πŸ“š The Present: Aggregates WHO Publications & Reports via API (with optional PDF text extraction).
  3. πŸ”¬ The Future: Tracks ongoing Clinical Trials for cures (R&D).

🎯 Perfect For

  • πŸ›οΈ NGOs & Humanitarian Aid – Map active outbreaks to deploy resources efficiently
  • πŸ’Š Pharma & Biotech – Track disease trends + competitor R&D pipelines
  • πŸ“° Data Journalists – Scan 100+ PDFs for keywords instantly
  • πŸ”¬ Health Researchers – Access structured WHO data without API complexity
  • πŸ€– AI Developers – RAG-ready health data for LLM systems
  • πŸ“Š Risk Analysis: Monitor stability and health risks in specific regions for corporate intelligence

βš™οΈ Key Features

πŸ“š WHO Publications API + PDF Text Extraction

Access 18,000+ WHO publications via official API.
Optionally downloads PDFs and extracts full text content.

  • Fetches publications by keyword and date range
  • Downloads and parses PDF documents
  • Extracts text from complex PDF layouts

🧠 Free NLP Location Detection (No API Key Required)

Uses embedded Natural Language Processing (compromise.js) to analyze PDF text at zero extra cost.

  • Auto-Location Detection: Finds cities and countries (e.g., "Yemen", "Sudan", "Kenya") inside extracted PDF text automatically
  • Smart Cleaning: Removes duplicates, filters generic regions, and strips quotes/formatting noise

πŸ”” Watchlist Alerts

Define keywords (e.g., Cholera, Level 3, Outbreak). The Actor scans report titles and full PDF text.

  • If a match is found, the ⚠️ ALERTS column in your CSV is flagged
  • Optional: Send instant notifications to Slack or Discord

πŸ“Š Organized Multi-Dataset Output (Limited-permissions friendly)

This Actor keeps permissions limited by producing:

  • Combined dataset (default): All normalized items are pushed to the run’s default dataset for immediate viewing in the Output tab (linked via {{links.apiDefaultDatasetUrl}}).
  • 3 separate per-run datasets: GHO / Publications / Trials each go into their own dataset created uniquely per run (so old runs don’t mix with new ones).
  • Stable discovery of those per-run datasets: The Actor writes a DATASET_LINKS JSON record to the run’s default key-value store, and the output schema links to it (key-value store is created per run).[1]
  • HTML dashboard: Stored as a KV record and linked from output schema.[1]
  • CSV export (optional): Stored as a KV record when outputFormat=csv.[1]

πŸŽ›οΈ Input Configuration

SectionSettingDescription
πŸ“š PublicationsKeyword FilterSearch term (e.g., "cholera", "mpox")
πŸ“š PublicationsExtract PDF TextEnable this to download PDFs and extract text (includes NLP location detection)
πŸ“Š GHO StatsIndicatorsCodes like WHOSIS_000001 (Life Expectancy), MORT_100 (Mortality)
πŸ“Š GHO StatsYears/Date RangeFilter by specific years or year range (startYear/endYear)
πŸ”” AlertsAlert KeywordsKeywords to flag documents (e.g., "outbreak", "emergency")
πŸ”¬ TrialsTarget DiseaseDisease to search for (e.g., "Malaria", "Cancer")
πŸ”¬ TrialsStatus/PhaseFilter by recruitment status and trial phase
πŸ“€ OutputOutput formatjson (default) or csv

πŸ“¦ Outputs (Where to find what)

1) Combined results (default dataset)

  • In Apify Console β†’ Run β†’ Output, open Combined dataset (default).
  • This points to {{links.apiDefaultDatasetUrl}}/items as defined by Apify output schema variables.

2) Separate per-run datasets (GHO / Publications / Trials)

Because these datasets are created per run with run-scoped names, their IDs/URLs are published in:

  • Key-value store record: DATASET_LINKS (JSON) in the run’s default KV store.[1]
  • In Apify Console β†’ Run β†’ Output, open Separate datasets (links) (it links to {{links.apiDefaultKeyValueStoreUrl}}/records/DATASET_LINKS).

Expected DATASET_LINKS format:

{
"ghoItems": "https://api.apify.com/v2/datasets/<id>/items",
"publicationsItems": "https://api.apify.com/v2/datasets/<id>/items",
"trialsItems": "https://api.apify.com/v2/datasets/<id>/items"
}

3) Raw source JSON (KV records)

  • DATA_GHO (raw GHO items)
  • DATA_PUBLICATIONS (raw publications)
  • DATA_TRIALS (raw trials)

These are stored in the run’s default key-value store (one per run).[1]

4) HTML dashboard + CSV

  • Dashboard: OUTPUT_DASHBOARD (HTML record linked from output schema using the /records/<key> form).
  • CSV: OUTPUT_CSV (generated only if outputFormat=csv).[1]

πŸ“€ Output Example

The Actor produces a comprehensive dataset. Here is a simplified preview:

[
{
"type": "Publication",
"title": "Multi-country outbreak of cholera, Report #32",
"date": "11/26/2024",
"url": "https://www.who.int/publications/m/item/...",
"extractedLocations": ["Zimbabwe", "Mozambique", "Ethiopia", "Kenya"],
"alertTriggered": true,
"matchedKeywords": ["cholera", "outbreak"]
},
{
"type": "GHO Indicator",
"title": "WHOSIS_000001 - IND",
"date": "2020",
"value": 69.17521966
},
{
"type": "Clinical Trial",
"title": "Phase 3 Malaria Vaccine Study in Children",
"nctId": "NCT05790889",
"status": "RECRUITING",
"phase": "PHASE2",
"conditions": "Malaria",
"url": "https://clinicaltrials.gov/study/NCT05790889"
}
]

This Actor respects WHO and ClinicalTrials.gov terms of service:


Built with ❀️ and </> by The_Rook - Try it now πŸš€

1