PEP Screening — OpenSanctions + Wikidata + National Registries avatar

PEP Screening — OpenSanctions + Wikidata + National Registries

Pricing

Pay per event

Go to Apify Store
PEP Screening — OpenSanctions + Wikidata + National Registries

PEP Screening — OpenSanctions + Wikidata + National Registries

PEP screening for AML/KYC. Streams 1.87M politically exposed persons from OpenSanctions (daily refresh), Wikidata, EU MEPs, US Congress, and UK Companies House PSC. FATF categories, family/RCA graph, three modes: ingest, fuzzy-match screening, new-PEPs diff.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

1

Monthly active users

3 days ago

Last modified

Categories

Share

PEP Screening Scraper — OpenSanctions + Wikidata + National Registries

Streams politically exposed persons (PEPs) from OpenSanctions, Wikidata, EU Parliament, US Congress, and UK Companies House. Returns structured FATF-categorized records for AML/KYC compliance workflows — up to 1.87M entities across five sources.


PEP Screening Scraper Features

  • Streams 1.87M PEP records from OpenSanctions' daily-refreshed FTM NDJSON dataset — no API key required
  • Assigns FATF categories to every record: head_of_state, minister, legislator, judiciary, military, diplomat, soe_executive, central_bank, family_member, close_associate
  • Includes Relatives and Close Associates (RCA) per FATF Recommendation 12 — family members and business associates of PEPs, toggleable
  • Fuzzy name matching in screen_queries mode — Levenshtein-based 0-100 score against all name variants, aliases, and transliterations
  • Three operating modes: bulk ingest, fuzzy-match screening, and incremental diff since last run
  • Multi-source coverage: OpenSanctions, Wikidata SPARQL, EU MEPs XML feed, US Congress API (BYO key), UK Companies House PSC (BYO key)
  • Country and category filters — narrow to specific jurisdictions (ISO-3166-1 alpha-2) or FATF categories without streaming the full dataset
  • new_peps_diff mode — reads last-run timestamp from KV store, outputs only newly-added PEPs since that run, useful for daily monitoring pipelines

What Can You Do With PEP Screening Data?

  • Compliance teams — run KYC due-diligence checks against the full FATF PEP universe before onboarding clients
  • Risk screening platforms — integrate the bulk ingest into a downstream PEP database, refreshed daily from OpenSanctions
  • Fintech and neobanks — automate screening at account-open time by querying the screen_queries mode against applicant names
  • Investigative journalists — identify officials' family members and close associates in the RCA graph for cross-referencing against business registries
  • Sanctions screening vendors — use sanctions_overlap_id to cross-reference PEPs who also appear on sanctions lists, already resolved against the sanctions-screening-scraper entity schema
  • AML model teams — use new_peps_diff mode to track newly-exposed politicians, feeding alerts into monitoring workflows without re-ingesting 1.87M records each run

How PEP Screening Scraper Works

  1. Select your source and mode. Choose one or more sources (opensanctions, wikidata, eu_meps, us_congress, uk_psc) and a mode. For bulk compliance data, use ingest_lists. For name-based screening, use screen_queries. For daily monitoring, use new_peps_diff.
  2. Apply filters. Country codes, FATF categories, and the currentOnly toggle narrow the stream before any data hits the output. This keeps runs fast and datasets manageable.
  3. The scraper streams source data efficiently. OpenSanctions is 921MB of NDJSON — processed line-by-line without loading into memory. EU MEPs XML is ~200KB and returns in seconds. US Congress and UK Companies House require free API keys and are skipped gracefully without them.
  4. Records are FATF-normalized on output. Every entity gets pep_category, pep_class, and is_current fields derived from source topic tags — consistent across all five sources so you're not stitching together incompatible schemas.

Input

{
"mode": "ingest_lists",
"sources": ["opensanctions"],
"countries": ["US", "GB", "DE"],
"pepCategories": ["head_of_state", "minister"],
"currentOnly": false,
"includeFamily": true,
"maxItems": 1000,
"sp_intended_usage": "AML onboarding screening",
"sp_improvement_suggestions": "none"
}
FieldTypeDefaultDescription
modestringingest_listsOperating mode: ingest_lists, screen_queries, or new_peps_diff
sourcesarray["opensanctions"]Sources to pull from: opensanctions, wikidata, eu_meps, us_congress, uk_psc. Leave empty for all.
pepCategoriesarrayFATF category filter. Empty = all categories.
countriesarrayISO-3166-1 alpha-2 country codes. Filters by position country.
currentOnlybooleanfalseExclude PEPs whose position has ended. FATF recommends screening former PEPs for 12–18 months.
includeFamilybooleantrueInclude Relatives and Close Associates (FATF Recommendation 12).
queriesarrayNames to fuzzy-match. Required for screen_queries mode.
minMatchScoreinteger75Minimum Levenshtein match score (0–100). 75 is a standard AML threshold.
usCongressApiKeystringFree key from api.congress.gov. Without it, falls back to OpenSanctions' us_congress dataset.
ukCompaniesHouseApiKeystringFree key from developer.company-information.service.gov.uk. Required for live PSC data.
maxItemsinteger10Maximum records to return. Set to 0 for unlimited (full ingest).

Screen queries example:

{
"mode": "screen_queries",
"sources": ["opensanctions", "eu_meps"],
"queries": ["Emmanuel Macron", "Angela Merkel", "Viktor Orbán"],
"minMatchScore": 70,
"maxItems": 50,
"sp_intended_usage": "KYC name screening",
"sp_improvement_suggestions": "none"
}

Incremental diff example:

{
"mode": "new_peps_diff",
"sources": ["opensanctions"],
"maxItems": 0,
"sp_intended_usage": "Daily new-PEP monitoring",
"sp_improvement_suggestions": "none"
}

PEP Screening Scraper Output Fields

ingest_lists and new_peps_diff Output

{
"entity_id": "NK-A7Bq3Rx9mVwXt2ZLp4nYs",
"source": "opensanctions",
"source_id": "NK-A7Bq3Rx9mVwXt2ZLp4nYs",
"source_url": "https://www.opensanctions.org/entities/NK-A7Bq3Rx9mVwXt2ZLp4nYs/",
"primary_name": "Emmanuel Macron",
"alias_names": "Macron, Emmanuel Jean-Michel Frédéric Macron",
"nationalities": "FR",
"date_of_birth": "1977-12-21",
"place_of_birth": "Amiens, France",
"gender": "male",
"pep_category": "head_of_state",
"pep_class": "PEP",
"position_title": "President of the Republic",
"position_country": "FR",
"position_organization": "French Republic",
"position_start_date": "2017-05-14",
"position_end_date": null,
"is_current": true,
"sources_count": 3,
"related_persons": "spouse:NK-Bq4Xt9ZLp2|parent:NK-Cx8mVwXt",
"related_organizations": "La République En Marche|Élysée Palace",
"declared_assets": null,
"sanctions_overlap_id": null,
"last_modified_date": "2025-01-15T00:00:00Z",
"query_term": null,
"match_score": null,
"match_fields": null,
"match_reason": null
}
FieldTypeDescription
entity_idstringCanonical PEP identifier (OpenSanctions NK-ID or generated)
sourcestringData source: opensanctions, wikidata, eu_meps, us_congress, uk_psc
source_idstringSource-native entity ID
source_urlstringCanonical URL for this entity at the source
primary_namestringPrimary full name
alias_namesstringPipe-separated aliases and alternative names
nationalitiesstringComma-separated ISO-3166-1 alpha-2 country codes
date_of_birthstringDate of birth (YYYY-MM-DD or partial)
place_of_birthstringPlace of birth as free text
genderstringmale, female, or other
pep_categorystringFATF category: head_of_state, minister, legislator, judiciary, military, diplomat, soe_executive, central_bank, family_member, close_associate
pep_classstringFATF class: PEP, RCA, Family, or Associate
position_titlestringOfficial position or job title
position_countrystringISO-3166-1 alpha-2 country of the position
position_organizationstringOrganization or institution
position_start_datestringStart date of position (YYYY-MM-DD)
position_end_datestringEnd date of position, null if currently held
is_currentbooleanTrue if position is currently held
sources_countintegerNumber of independent sources confirming this entity
related_personsstringPipe-separated related persons with relationship type
related_organizationsstringPipe-separated related organizations
declared_assetsstringPipe-separated declared assets (US OGE, EU MEP disclosures)
sanctions_overlap_idstringCross-reference to sanctions-screening-scraper entity if this PEP also appears on sanctions lists
last_modified_datestringDate source record was last modified (ISO 8601)
query_termstringSearch query that matched this record (screen_queries mode only)
match_scorenumberFuzzy match score 0–100 (screen_queries mode only)
match_fieldsstringPipe-separated fields that contributed to the match
match_reasonstringHuman-readable match explanation

screen_queries Output

Same schema. query_term, match_score, match_fields, and match_reason are populated. Every returned record exceeded the configured minMatchScore threshold.

{
"entity_id": "NK-A7Bq3Rx9mVwXt2ZLp4nYs",
"source": "opensanctions",
"primary_name": "Emmanuel Macron",
"alias_names": "Macron, E. Macron",
"pep_category": "head_of_state",
"pep_class": "PEP",
"position_title": "President of the Republic",
"position_country": "FR",
"is_current": true,
"query_term": "Emmanuel Macron",
"match_score": 100,
"match_fields": "primary_name",
"match_reason": "Exact match on primary name"
}

🔍 FAQ

How do I screen names against the PEP database?

PEP Screening Scraper handles this in screen_queries mode. Provide a queries list and a minMatchScore (75 is standard for AML). The scraper runs Levenshtein fuzzy matching against all name variants in the source data, including aliases, transliterations, and maiden names.

Does this require API keys?

OpenSanctions, Wikidata, and EU MEPs work with no API key. US Congress and UK Companies House are optional — provide your own free key from api.congress.gov or developer.company-information.service.gov.uk, or skip them and rely on OpenSanctions' daily-refreshed snapshots of the same data.

How current is the PEP data?

OpenSanctions refreshes daily. EU MEP data updates from the official XML feed. Wikidata is community-maintained and typically current to within weeks. Run new_peps_diff mode on a daily schedule to receive only newly-added PEPs since your last run — the scraper tracks its own timestamp in Apify's key-value store.

What FATF categories are covered?

PEP Screening Scraper covers all ten FATF Recommendation 12 categories: head_of_state, minister, legislator, judiciary, military, diplomat, soe_executive, central_bank, family_member, and close_associate. Every record gets a pep_class field (PEP, RCA, Family, or Associate) for downstream filtering.

Can I limit to current office-holders only?

Set currentOnly: true. Note that FATF Recommendation 12 explicitly advises screening former PEPs for 12–18 months after leaving office. Leaving it at the default (false) is the more conservative AML posture.


Need More Features?

Need additional sources, custom match scoring, or integration with your sanctions database? File an issue or get in touch.

Why Use PEP Screening Scraper?

  • No vendor lock-in — OpenSanctions is open-licensed FATF-quality data, updated daily, no subscription required. Most commercial PEP databases charge four-figure annual fees for the same underlying sources.
  • FATF-normalized output — every record gets consistent pep_category and pep_class fields regardless of source, so your downstream pipeline doesn't need per-source parsing logic.
  • Scales from screening to bulk — point it at five names or stream 1.87M records. Same actor, same schema, same output format.