# PubMed KOL Profile Builder — Medical Affairs Edition (`azureblue/pubmed-kol-profile-builder`) Actor

Builds ranked Key Opinion Leader profiles per indication from PubMed: authors with affiliation, recent publication volume, h-index estimate, co-author network breadth, geography. For Pharma Medical Affairs, MSL planning, and KOL identification.

- **URL**: https://apify.com/azureblue/pubmed-kol-profile-builder.md
- **Developed by:** [azureblue](https://apify.com/azureblue) (community)
- **Categories:** Developer tools, Business
- **Stats:** 2 total users, 1 monthly users, 0.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per event

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## PubMed KOL Profile Builder — Medical Affairs Edition

Given an indication ("non-small cell lung cancer", "GLP-1 obesity", "CAR-T lymphoma"), return a **ranked list of Key Opinion Leaders** with affiliation, recent publication volume, h-index estimate, co-author network breadth, and geography — built from PubMed, no commercial KOL database licence required.

For **Pharma Medical Affairs, MSL planning, and KOL identification** workflows.

---

### What this Actor does

Data source: [NCBI E-utilities](https://www.ncbi.nlm.nih.gov/books/NBK25500/) (esearch + efetch on PubMed) — the official NLM API. No key required.

On each run:

1. Searches PubMed for your indication over the last N years (default 5).
2. Fetches up to `maxPubmedRecords` (default 500) — author lists with affiliations.
3. Scores every author with the **composite KOL signal** (recency-weighted publication volume + log-scaled citation count + co-author network breadth + journal diversity).
4. Returns the **top-N** authors as ranked dataset items, each with the `azureblue/medical-core` envelope so you can join with our PubMed-Abstract, ClinicalTrials, Cochrane, or Conference actors on `sourceUrl` / `dataHash`.

---

### Use Cases

#### 1. Pre-launch KOL identification for a Medical Affairs team
A pharma Medical Affairs lead preparing a Phase-3 launch in NSCLC needs a defensible shortlist of 50 high-impact KOLs in Europe to seed advisory boards and MSL outreach. One run with `indication: "non-small cell lung cancer"`, `country: "DE"` returns the top 50 German oncologists ranked by 5-year publication impact in the indication. **Replaces ~$25k of bought KOL-list reports + 2 weeks of MSL desk research.**

#### 2. MSL territory mapping
A pharma MSL manager wants to assign 6 MSLs to optimal territories for a rare-disease portfolio. Running this Actor per country with `topN: 30` produces objective KOL distributions per country in under 5 minutes. **Removes the bias of legacy MSL relationships from territory planning** — the data tells you where the science actually is.

#### 3. Investigator-led-study (IIS) site sourcing
A clinical-development lead scouts academic sites for an IIS in CAR-T. Filter on `indication: "CAR-T lymphoma"`, look for `hIndexInQuery > 10` AND `recentPubCount >= 5` — that's the publication signal you want before sending a Letter of Interest. **Compresses 4-6 weeks of site-feasibility prep into 1 hour of dataset review.**

---

### Input

```json
{
  "indication": "non-small cell lung cancer",
  "country": "DE",
  "yearsBack": 5,
  "topN": 50
}
````

`mode: "trial"` returns 5 profiles free — sanity-check the ranking before subscribing.

***

### Output (sample item)

```json
{
  "indication": "non-small cell lung cancer",
  "rank": 1,
  "name": "Reck Martin",
  "affiliation": "LungenClinic Grosshansdorf, Member of the German Center for Lung Research (DZL), Grosshansdorf, Germany.",
  "country": "DE",
  "kolScore": 4.2731,
  "pubCount": 38,
  "recentPubCount": 24,
  "hIndexInQuery": 14,
  "totalCitations": 0,
  "coAuthorCount": 217,
  "firstPubDate": "2020-02-14",
  "lastPubDate": "2025-09-08",
  "journals": ["Journal of Clinical Oncology", "Lancet Oncology", "New England Journal of Medicine", "Lung Cancer", "Annals of Oncology"],
  "scrapedAt": "2026-05-12T08:30:00.000Z",
  "sourceUrl": "https://pubmed.ncbi.nlm.nih.gov/?term=Reck%20Martin%20AND%20non-small%20cell%20lung%20cancer",
  "sourceDomain": "pubmed.ncbi.nlm.nih.gov",
  "actorVersion": "1.0.0",
  "dataHash": "8a2f...d0"
}
```

`totalCitations: 0` in this sample because openFDA/Europe-PMC citation enrichment is queued for v1.1. Ranking still works — the score weights recency-volume + co-author count + journal breadth.

***

### Pricing

| Event | Price | When it fires |
|---|---|---|
| **Monthly subscription** | **$149** | Once per calendar month per user, on first `delta`-mode run — covers unlimited indication queries within the month |
| **Per KOL profile** | **$0.05** | Per ranked profile pushed to the dataset (`snapshot` or `delta` mode beyond the monthly cap) |

**Trial mode** (`mode: "trial"`): 5 profiles free, no charges, no state writes. Use to evaluate ranking quality on YOUR indication before subscribing.

**For Medical Affairs teams**: 1 subscribed user × 8 indications × 50 profiles/indication = ~400 profiles/month at flat $149. Compare to commercial KOL lists ($15k–$50k/year for static data).

***

### Compliance

- **Public-data only** — sourced from NCBI E-utilities, the NLM's official public API. No login walls, no PHI, no paywalled content.
- **Author affiliations are public** — the data ships exactly as PubMed displays it; no email scraping, no contact-detail enrichment.
- **GDPR Art. 6(1)(f) / equivalent** — buyer is responsible for downstream use of KOL data (legitimate interest as Pharma Medical Affairs research basis). Outreach to identified KOLs must follow the buyer's standard professional-engagement procedures.
- **No anti-bot scraping** — NCBI E-utilities is the *intended* programmatic interface.

***

### Sister Actors — complementary coverage from `azureblue`

- [`pubmed-abstract-scraper`](https://apify.com/azureblue/pubmed-abstract-scraper) — the underlying paper-level abstracts behind every KOL ranking
- [`clinical-trials-scraper`](https://apify.com/azureblue/clinical-trials-scraper) — cross-reference identified KOLs against their PI activity on ClinicalTrials.gov
- [`medical-conference-scraper`](https://apify.com/azureblue/medical-conference-scraper) — match KOLs to their conference late-breaker activity for current scientific signal
- [`medical-university-rankings`](https://apify.com/azureblue/medical-university-rankings) — score the KOL's institution alongside the KOL personally
- [`ema-drug-approval-watch`](https://apify.com/azureblue/ema-drug-approval-watch) — when a new EMA approval lands in your indication, this builder gives you the KOLs to brief

***

### Roadmap (v1.x)

- **v1.1**: citation enrichment via Europe PMC `citationsBySource` — populates `totalCitations` and improves the `kolScore` accuracy.
- **v1.2**: integration with [`pubmed-author-network-mapper`](https://apify.com/azureblue/pubmed-author-network-mapper) — feed a KOL into the network mapper for co-author graph + collaboration intensity.
- **v1.3**: conference-activity enrichment via `medical-conference-scraper` — distinguish "high publication volume" KOLs from "high stage presence" KOLs.

***

### Changelog

See `CHANGELOG.md` in this Actor.

# Actor input Schema

## `indication` (type: `string`):

PubMed search term identifying the therapy area or indication. Free-text, supports PubMed query syntax (Boolean, MeSH). Example: 'non-small cell lung cancer' or 'GLP-1 receptor agonist obesity' or 'CAR-T lymphoma'.

## `country` (type: `string`):

Restrict KOLs to those whose most recent PubMed affiliation maps to this country. Best-effort affiliation parsing — leave empty for global ranking. ISO 3166-1 alpha-2 codes: US, GB, DE, AT, CH, FR, IT, ES, NL, BE, SE, NO, DK, FI, CA, AU, NZ, JP, CN, KR, IN, BR, MX, IL.

## `yearsBack` (type: `integer`):

Only consider publications within this many years from today. Default 5 — captures current KOLs without diluting the score with retired authors.

## `maxPubmedRecords` (type: `integer`):

Higher values catch a longer tail of KOLs at the cost of run time. 500 is enough to rank the top ~50 in most indications; 2000+ for very large fields (oncology, cardiology).

## `topN` (type: `integer`):

Profiles returned, sorted by composite KOL score descending. The score weights recency (3-year half-life), citation count, co-author network breadth, and journal diversity.

## `mode` (type: `string`):

Delta is the rental tier — first run in a calendar month fires the $149 monthly subscription charge. Snapshot bypasses subscription tracking. Trial bypasses all charges — perfect for evaluating before subscribing.

## `webhookUrl` (type: `string`):

POST endpoint that receives one JSON payload per ranked KOL profile. Test with https://webhook.site. Failed deliveries persist in the KV store under `webhook.dlq.*` for replay.

## `webhookSecret` (type: `string`):

If set, every webhook request includes `X-Azureblue-Signature: sha256=<hex>` so your endpoint can verify.

## Actor input object example

```json
{
  "indication": "non-small cell lung cancer",
  "country": "DE",
  "yearsBack": 5,
  "maxPubmedRecords": 500,
  "topN": 50,
  "mode": "delta"
}
```

# Actor output Schema

## `dataset` (type: `string`):

Default dataset — each row is one author scored as a KOL for the input indication, sorted by composite KOL score descending.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "indication": "non-small cell lung cancer"
};

// Run the Actor and wait for it to finish
const run = await client.actor("azureblue/pubmed-kol-profile-builder").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "indication": "non-small cell lung cancer" }

# Run the Actor and wait for it to finish
run = client.actor("azureblue/pubmed-kol-profile-builder").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "indication": "non-small cell lung cancer"
}' |
apify call azureblue/pubmed-kol-profile-builder --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=azureblue/pubmed-kol-profile-builder",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "PubMed KOL Profile Builder — Medical Affairs Edition",
        "description": "Builds ranked Key Opinion Leader profiles per indication from PubMed: authors with affiliation, recent publication volume, h-index estimate, co-author network breadth, geography. For Pharma Medical Affairs, MSL planning, and KOL identification.",
        "version": "1.0",
        "x-build-id": "WYbwRxJFjkHLt3Ep9"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/azureblue~pubmed-kol-profile-builder/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-azureblue-pubmed-kol-profile-builder",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/azureblue~pubmed-kol-profile-builder/runs": {
            "post": {
                "operationId": "runs-sync-azureblue-pubmed-kol-profile-builder",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/azureblue~pubmed-kol-profile-builder/run-sync": {
            "post": {
                "operationId": "run-sync-azureblue-pubmed-kol-profile-builder",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "indication"
                ],
                "properties": {
                    "indication": {
                        "title": "Indication / therapy area",
                        "type": "string",
                        "description": "PubMed search term identifying the therapy area or indication. Free-text, supports PubMed query syntax (Boolean, MeSH). Example: 'non-small cell lung cancer' or 'GLP-1 receptor agonist obesity' or 'CAR-T lymphoma'."
                    },
                    "country": {
                        "title": "Country filter (ISO 2-letter)",
                        "type": "string",
                        "description": "Restrict KOLs to those whose most recent PubMed affiliation maps to this country. Best-effort affiliation parsing — leave empty for global ranking. ISO 3166-1 alpha-2 codes: US, GB, DE, AT, CH, FR, IT, ES, NL, BE, SE, NO, DK, FI, CA, AU, NZ, JP, CN, KR, IN, BR, MX, IL."
                    },
                    "yearsBack": {
                        "title": "Time window (years)",
                        "minimum": 1,
                        "maximum": 20,
                        "type": "integer",
                        "description": "Only consider publications within this many years from today. Default 5 — captures current KOLs without diluting the score with retired authors.",
                        "default": 5
                    },
                    "maxPubmedRecords": {
                        "title": "Max PubMed records to ingest",
                        "minimum": 50,
                        "maximum": 5000,
                        "type": "integer",
                        "description": "Higher values catch a longer tail of KOLs at the cost of run time. 500 is enough to rank the top ~50 in most indications; 2000+ for very large fields (oncology, cardiology).",
                        "default": 500
                    },
                    "topN": {
                        "title": "Top-N KOL profiles to return",
                        "minimum": 5,
                        "maximum": 500,
                        "type": "integer",
                        "description": "Profiles returned, sorted by composite KOL score descending. The score weights recency (3-year half-life), citation count, co-author network breadth, and journal diversity.",
                        "default": 50
                    },
                    "mode": {
                        "title": "Run mode",
                        "enum": [
                            "delta",
                            "snapshot",
                            "trial"
                        ],
                        "type": "string",
                        "description": "Delta is the rental tier — first run in a calendar month fires the $149 monthly subscription charge. Snapshot bypasses subscription tracking. Trial bypasses all charges — perfect for evaluating before subscribing.",
                        "default": "delta"
                    },
                    "webhookUrl": {
                        "title": "Webhook URL (optional)",
                        "type": "string",
                        "description": "POST endpoint that receives one JSON payload per ranked KOL profile. Test with https://webhook.site. Failed deliveries persist in the KV store under `webhook.dlq.*` for replay."
                    },
                    "webhookSecret": {
                        "title": "Webhook HMAC secret (optional)",
                        "type": "string",
                        "description": "If set, every webhook request includes `X-Azureblue-Signature: sha256=<hex>` so your endpoint can verify."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
