WHO Health Intelligence Scraper
Pricing
from $0.01 / 1,000 results
WHO Health Intelligence Scraper
Aggregates WHO health data from multiple APIs: Publications (18K+ documents with PDF text extraction), GHO Statistics, and ClinicalTrials.gov. Features NLP location detection and alert keyword matching.
Pricing
from $0.01 / 1,000 results
Rating
5.0
(11)
Developer

The_Rook
Actor stats
10
Bookmarked
12
Total users
8
Monthly active users
7 days ago
Last modified
Categories
Share
π₯ WHO Health Intelligence Gateway
All-in-one intelligence tool for Global Health Monitoring
This Actor unifies the three pillars of health data into a single, automated feed. It does the work of a data analyst in minutes:
- π The Past: Fetches official statistics (GHO API).
- π The Present: Aggregates WHO Publications & Reports via API (with optional PDF text extraction).
- π¬ The Future: Tracks ongoing Clinical Trials for cures (R&D).
π― Perfect For
- ποΈ NGOs & Humanitarian Aid β Map active outbreaks to deploy resources efficiently
- π Pharma & Biotech β Track disease trends + competitor R&D pipelines
- π° Data Journalists β Scan 100+ PDFs for keywords instantly
- π¬ Health Researchers β Access structured WHO data without API complexity
- π€ AI Developers β RAG-ready health data for LLM systems
- π Risk Analysis: Monitor stability and health risks in specific regions for corporate intelligence
βοΈ Key Features
π WHO Publications API + PDF Text Extraction
Access 18,000+ WHO publications via official API.
Optionally downloads PDFs and extracts full text content.
- Fetches publications by keyword and date range
- Downloads and parses PDF documents
- Extracts text from complex PDF layouts
π§ Free NLP Location Detection (No API Key Required)
Uses embedded Natural Language Processing (compromise.js) to analyze PDF text at zero extra cost.
- Auto-Location Detection: Finds cities and countries (e.g., "Yemen", "Sudan", "Kenya") inside extracted PDF text automatically
- Smart Cleaning: Removes duplicates, filters generic regions, and strips quotes/formatting noise
π Watchlist Alerts
Define keywords (e.g., Cholera, Level 3, Outbreak). The Actor scans report titles and full PDF text.
- If a match is found, the β οΈ ALERTS column in your CSV is flagged
- Optional: Send instant notifications to Slack or Discord
π Organized Multi-Dataset Output (Limited-permissions friendly)
This Actor keeps permissions limited by producing:
- Combined dataset (default): All normalized items are pushed to the runβs default dataset for immediate viewing in the Output tab (linked via
{{links.apiDefaultDatasetUrl}}). - 3 separate per-run datasets: GHO / Publications / Trials each go into their own dataset created uniquely per run (so old runs donβt mix with new ones).
- Stable discovery of those per-run datasets: The Actor writes a
DATASET_LINKSJSON record to the runβs default key-value store, and the output schema links to it (key-value store is created per run).[1] - HTML dashboard: Stored as a KV record and linked from output schema.[1]
- CSV export (optional): Stored as a KV record when
outputFormat=csv.[1]
ποΈ Input Configuration
| Section | Setting | Description |
|---|---|---|
| π Publications | Keyword Filter | Search term (e.g., "cholera", "mpox") |
| π Publications | Extract PDF Text | Enable this to download PDFs and extract text (includes NLP location detection) |
| π GHO Stats | Indicators | Codes like WHOSIS_000001 (Life Expectancy), MORT_100 (Mortality) |
| π GHO Stats | Years/Date Range | Filter by specific years or year range (startYear/endYear) |
| π Alerts | Alert Keywords | Keywords to flag documents (e.g., "outbreak", "emergency") |
| π¬ Trials | Target Disease | Disease to search for (e.g., "Malaria", "Cancer") |
| π¬ Trials | Status/Phase | Filter by recruitment status and trial phase |
| π€ Output | Output format | json (default) or csv |
π¦ Outputs (Where to find what)
1) Combined results (default dataset)
- In Apify Console β Run β Output, open Combined dataset (default).
- This points to
{{links.apiDefaultDatasetUrl}}/itemsas defined by Apify output schema variables.
2) Separate per-run datasets (GHO / Publications / Trials)
Because these datasets are created per run with run-scoped names, their IDs/URLs are published in:
- Key-value store record:
DATASET_LINKS(JSON) in the runβs default KV store.[1] - In Apify Console β Run β Output, open Separate datasets (links) (it links to
{{links.apiDefaultKeyValueStoreUrl}}/records/DATASET_LINKS).
Expected DATASET_LINKS format:
{"ghoItems": "https://api.apify.com/v2/datasets/<id>/items","publicationsItems": "https://api.apify.com/v2/datasets/<id>/items","trialsItems": "https://api.apify.com/v2/datasets/<id>/items"}
3) Raw source JSON (KV records)
DATA_GHO(raw GHO items)DATA_PUBLICATIONS(raw publications)DATA_TRIALS(raw trials)
These are stored in the runβs default key-value store (one per run).[1]
4) HTML dashboard + CSV
- Dashboard:
OUTPUT_DASHBOARD(HTML record linked from output schema using the/records/<key>form). - CSV:
OUTPUT_CSV(generated only ifoutputFormat=csv).[1]
π€ Output Example
The Actor produces a comprehensive dataset. Here is a simplified preview:
[{"type": "Publication","title": "Multi-country outbreak of cholera, Report #32","date": "11/26/2024","url": "https://www.who.int/publications/m/item/...","extractedLocations": ["Zimbabwe", "Mozambique", "Ethiopia", "Kenya"],"alertTriggered": true,"matchedKeywords": ["cholera", "outbreak"]},{"type": "GHO Indicator","title": "WHOSIS_000001 - IND","date": "2020","value": 69.17521966},{"type": "Clinical Trial","title": "Phase 3 Malaria Vaccine Study in Children","nctId": "NCT05790889","status": "RECRUITING","phase": "PHASE2","conditions": "Malaria","url": "https://clinicaltrials.gov/study/NCT05790889"}]
βοΈ Legal & Compliance
This Actor respects WHO and ClinicalTrials.gov terms of service:
- β Uses official WHO Publications API (https://www.who.int/api/hubs/meetingreports)
- β Uses official WHO GHO OData API (https://ghoapi.azureedge.net/api)
- β Uses official ClinicalTrials.gov API v2 (https://clinicaltrials.gov/api/v2/studies)
- β No web scraping - all data accessed via public APIs
- β PDF downloads respect standard HTTP protocols
Built with β€οΈ and </> by The_Rook - Try it now π