PEP Screening — OpenSanctions + Wikidata + National Registries
Pricing
Pay per event
PEP Screening — OpenSanctions + Wikidata + National Registries
PEP screening for AML/KYC. Streams 1.87M politically exposed persons from OpenSanctions (daily refresh), Wikidata, EU MEPs, US Congress, and UK Companies House PSC. FATF categories, family/RCA graph, three modes: ingest, fuzzy-match screening, new-PEPs diff.
Pricing
Pay per event
Rating
0.0
(0)
Developer
BowTiedRaccoon
Maintained by CommunityActor stats
0
Bookmarked
3
Total users
1
Monthly active users
3 days ago
Last modified
Share
PEP Screening Scraper — OpenSanctions + Wikidata + National Registries
Streams politically exposed persons (PEPs) from OpenSanctions, Wikidata, EU Parliament, US Congress, and UK Companies House. Returns structured FATF-categorized records for AML/KYC compliance workflows — up to 1.87M entities across five sources.
PEP Screening Scraper Features
- Streams 1.87M PEP records from OpenSanctions' daily-refreshed FTM NDJSON dataset — no API key required
- Assigns FATF categories to every record:
head_of_state,minister,legislator,judiciary,military,diplomat,soe_executive,central_bank,family_member,close_associate - Includes Relatives and Close Associates (RCA) per FATF Recommendation 12 — family members and business associates of PEPs, toggleable
- Fuzzy name matching in
screen_queriesmode — Levenshtein-based 0-100 score against all name variants, aliases, and transliterations - Three operating modes: bulk ingest, fuzzy-match screening, and incremental diff since last run
- Multi-source coverage: OpenSanctions, Wikidata SPARQL, EU MEPs XML feed, US Congress API (BYO key), UK Companies House PSC (BYO key)
- Country and category filters — narrow to specific jurisdictions (ISO-3166-1 alpha-2) or FATF categories without streaming the full dataset
- new_peps_diff mode — reads last-run timestamp from KV store, outputs only newly-added PEPs since that run, useful for daily monitoring pipelines
What Can You Do With PEP Screening Data?
- Compliance teams — run KYC due-diligence checks against the full FATF PEP universe before onboarding clients
- Risk screening platforms — integrate the bulk ingest into a downstream PEP database, refreshed daily from OpenSanctions
- Fintech and neobanks — automate screening at account-open time by querying the
screen_queriesmode against applicant names - Investigative journalists — identify officials' family members and close associates in the RCA graph for cross-referencing against business registries
- Sanctions screening vendors — use
sanctions_overlap_idto cross-reference PEPs who also appear on sanctions lists, already resolved against the sanctions-screening-scraper entity schema - AML model teams — use
new_peps_diffmode to track newly-exposed politicians, feeding alerts into monitoring workflows without re-ingesting 1.87M records each run
How PEP Screening Scraper Works
- Select your source and mode. Choose one or more sources (
opensanctions,wikidata,eu_meps,us_congress,uk_psc) and a mode. For bulk compliance data, useingest_lists. For name-based screening, usescreen_queries. For daily monitoring, usenew_peps_diff. - Apply filters. Country codes, FATF categories, and the
currentOnlytoggle narrow the stream before any data hits the output. This keeps runs fast and datasets manageable. - The scraper streams source data efficiently. OpenSanctions is 921MB of NDJSON — processed line-by-line without loading into memory. EU MEPs XML is ~200KB and returns in seconds. US Congress and UK Companies House require free API keys and are skipped gracefully without them.
- Records are FATF-normalized on output. Every entity gets
pep_category,pep_class, andis_currentfields derived from source topic tags — consistent across all five sources so you're not stitching together incompatible schemas.
Input
{"mode": "ingest_lists","sources": ["opensanctions"],"countries": ["US", "GB", "DE"],"pepCategories": ["head_of_state", "minister"],"currentOnly": false,"includeFamily": true,"maxItems": 1000,"sp_intended_usage": "AML onboarding screening","sp_improvement_suggestions": "none"}
| Field | Type | Default | Description |
|---|---|---|---|
mode | string | ingest_lists | Operating mode: ingest_lists, screen_queries, or new_peps_diff |
sources | array | ["opensanctions"] | Sources to pull from: opensanctions, wikidata, eu_meps, us_congress, uk_psc. Leave empty for all. |
pepCategories | array | — | FATF category filter. Empty = all categories. |
countries | array | — | ISO-3166-1 alpha-2 country codes. Filters by position country. |
currentOnly | boolean | false | Exclude PEPs whose position has ended. FATF recommends screening former PEPs for 12–18 months. |
includeFamily | boolean | true | Include Relatives and Close Associates (FATF Recommendation 12). |
queries | array | — | Names to fuzzy-match. Required for screen_queries mode. |
minMatchScore | integer | 75 | Minimum Levenshtein match score (0–100). 75 is a standard AML threshold. |
usCongressApiKey | string | — | Free key from api.congress.gov. Without it, falls back to OpenSanctions' us_congress dataset. |
ukCompaniesHouseApiKey | string | — | Free key from developer.company-information.service.gov.uk. Required for live PSC data. |
maxItems | integer | 10 | Maximum records to return. Set to 0 for unlimited (full ingest). |
Screen queries example:
{"mode": "screen_queries","sources": ["opensanctions", "eu_meps"],"queries": ["Emmanuel Macron", "Angela Merkel", "Viktor Orbán"],"minMatchScore": 70,"maxItems": 50,"sp_intended_usage": "KYC name screening","sp_improvement_suggestions": "none"}
Incremental diff example:
{"mode": "new_peps_diff","sources": ["opensanctions"],"maxItems": 0,"sp_intended_usage": "Daily new-PEP monitoring","sp_improvement_suggestions": "none"}
PEP Screening Scraper Output Fields
ingest_lists and new_peps_diff Output
{"entity_id": "NK-A7Bq3Rx9mVwXt2ZLp4nYs","source": "opensanctions","source_id": "NK-A7Bq3Rx9mVwXt2ZLp4nYs","source_url": "https://www.opensanctions.org/entities/NK-A7Bq3Rx9mVwXt2ZLp4nYs/","primary_name": "Emmanuel Macron","alias_names": "Macron, Emmanuel Jean-Michel Frédéric Macron","nationalities": "FR","date_of_birth": "1977-12-21","place_of_birth": "Amiens, France","gender": "male","pep_category": "head_of_state","pep_class": "PEP","position_title": "President of the Republic","position_country": "FR","position_organization": "French Republic","position_start_date": "2017-05-14","position_end_date": null,"is_current": true,"sources_count": 3,"related_persons": "spouse:NK-Bq4Xt9ZLp2|parent:NK-Cx8mVwXt","related_organizations": "La République En Marche|Élysée Palace","declared_assets": null,"sanctions_overlap_id": null,"last_modified_date": "2025-01-15T00:00:00Z","query_term": null,"match_score": null,"match_fields": null,"match_reason": null}
| Field | Type | Description |
|---|---|---|
entity_id | string | Canonical PEP identifier (OpenSanctions NK-ID or generated) |
source | string | Data source: opensanctions, wikidata, eu_meps, us_congress, uk_psc |
source_id | string | Source-native entity ID |
source_url | string | Canonical URL for this entity at the source |
primary_name | string | Primary full name |
alias_names | string | Pipe-separated aliases and alternative names |
nationalities | string | Comma-separated ISO-3166-1 alpha-2 country codes |
date_of_birth | string | Date of birth (YYYY-MM-DD or partial) |
place_of_birth | string | Place of birth as free text |
gender | string | male, female, or other |
pep_category | string | FATF category: head_of_state, minister, legislator, judiciary, military, diplomat, soe_executive, central_bank, family_member, close_associate |
pep_class | string | FATF class: PEP, RCA, Family, or Associate |
position_title | string | Official position or job title |
position_country | string | ISO-3166-1 alpha-2 country of the position |
position_organization | string | Organization or institution |
position_start_date | string | Start date of position (YYYY-MM-DD) |
position_end_date | string | End date of position, null if currently held |
is_current | boolean | True if position is currently held |
sources_count | integer | Number of independent sources confirming this entity |
related_persons | string | Pipe-separated related persons with relationship type |
related_organizations | string | Pipe-separated related organizations |
declared_assets | string | Pipe-separated declared assets (US OGE, EU MEP disclosures) |
sanctions_overlap_id | string | Cross-reference to sanctions-screening-scraper entity if this PEP also appears on sanctions lists |
last_modified_date | string | Date source record was last modified (ISO 8601) |
query_term | string | Search query that matched this record (screen_queries mode only) |
match_score | number | Fuzzy match score 0–100 (screen_queries mode only) |
match_fields | string | Pipe-separated fields that contributed to the match |
match_reason | string | Human-readable match explanation |
screen_queries Output
Same schema. query_term, match_score, match_fields, and match_reason are populated. Every returned record exceeded the configured minMatchScore threshold.
{"entity_id": "NK-A7Bq3Rx9mVwXt2ZLp4nYs","source": "opensanctions","primary_name": "Emmanuel Macron","alias_names": "Macron, E. Macron","pep_category": "head_of_state","pep_class": "PEP","position_title": "President of the Republic","position_country": "FR","is_current": true,"query_term": "Emmanuel Macron","match_score": 100,"match_fields": "primary_name","match_reason": "Exact match on primary name"}
🔍 FAQ
How do I screen names against the PEP database?
PEP Screening Scraper handles this in screen_queries mode. Provide a queries list and a minMatchScore (75 is standard for AML). The scraper runs Levenshtein fuzzy matching against all name variants in the source data, including aliases, transliterations, and maiden names.
Does this require API keys?
OpenSanctions, Wikidata, and EU MEPs work with no API key. US Congress and UK Companies House are optional — provide your own free key from api.congress.gov or developer.company-information.service.gov.uk, or skip them and rely on OpenSanctions' daily-refreshed snapshots of the same data.
How current is the PEP data?
OpenSanctions refreshes daily. EU MEP data updates from the official XML feed. Wikidata is community-maintained and typically current to within weeks. Run new_peps_diff mode on a daily schedule to receive only newly-added PEPs since your last run — the scraper tracks its own timestamp in Apify's key-value store.
What FATF categories are covered?
PEP Screening Scraper covers all ten FATF Recommendation 12 categories: head_of_state, minister, legislator, judiciary, military, diplomat, soe_executive, central_bank, family_member, and close_associate. Every record gets a pep_class field (PEP, RCA, Family, or Associate) for downstream filtering.
Can I limit to current office-holders only?
Set currentOnly: true. Note that FATF Recommendation 12 explicitly advises screening former PEPs for 12–18 months after leaving office. Leaving it at the default (false) is the more conservative AML posture.
Need More Features?
Need additional sources, custom match scoring, or integration with your sanctions database? File an issue or get in touch.
Why Use PEP Screening Scraper?
- No vendor lock-in — OpenSanctions is open-licensed FATF-quality data, updated daily, no subscription required. Most commercial PEP databases charge four-figure annual fees for the same underlying sources.
- FATF-normalized output — every record gets consistent
pep_categoryandpep_classfields regardless of source, so your downstream pipeline doesn't need per-source parsing logic. - Scales from screening to bulk — point it at five names or stream 1.87M records. Same actor, same schema, same output format.