# Pharma Research & Clinical Trial Monitor (`scrapemint/pubmed-clinical-trials-intelligence`) Actor

Pull PubMed papers and ClinicalTrials.gov studies at scale. PMIDs, DOIs, abstracts, MeSH terms, NCT IDs, phases, sponsors, enrollment, primary outcomes, results. One row per record. Pay per row.

- **URL**: https://apify.com/scrapemint/pubmed-clinical-trials-intelligence.md
- **Developed by:** [Ken M](https://apify.com/scrapemint) (community)
- **Categories:** Other
- **Stats:** 1 total users, 0 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Pharma Research & Clinical Trial Monitor: PubMed + ClinicalTrials.gov

Pull biomedical literature and clinical trial records at scale. Mixes PubMed papers and ClinicalTrials.gov studies in one run. PMIDs, DOIs, full abstracts, MeSH terms, author affiliations, ORCIDs, journal metadata, NCT IDs, trial phases, sponsors, enrollment, primary outcomes, posted results, and live citation counts via NCBI iCite. One row per record. Pay per row.

**Built for** pharma competitive intelligence teams, biotech analysts watching pipeline shifts, regulatory affairs staff tracking submissions, medical writers building systematic reviews, KOL mappers profiling investigators, CRO BD teams scouting active sites, science journalists tracing claims, AI teams training biomedical LLMs, and grant writers building reference packs.

**Keywords this actor ranks for:** pubmed api, pubmed scraper, clinicaltrials.gov api, biomedical literature search, drug pipeline monitor, clinical trial scraper, MeSH term extractor, NCT ID lookup, KOL mapping, pharma competitive intelligence, FDA pipeline tracker, oncology trial monitor, biomedical citation api, pharma BI feed.

---

### Why this actor

| Other tools | **This actor** |
|---|---|
| PubMed E-utilities raw: free but XML parsing, rate limits, no trial data | Both data sources in one normalized JSON row |
| ClinicalTrials.gov UI export: 1000 row cap, manual click | Unbounded, programmatic, paginates for you |
| TrialTrove / Citeline: $20K plus per seat per year | Pay per row, no minimum |
| Cortellis: enterprise contract only | Pay per row, no contract |
| BiopharmaCatalyst: free but no historical depth, US only | Global, full history, posted results included |
| Roll your own scraper: maintain 3 parsers, handle rate limits | Maintained selectors plus iCite enrichment built in |

---

### How it works

```mermaid
flowchart LR
    A[PubMed queries<br/>or PMIDs<br/>or CT.gov queries<br/>or NCT IDs] --> B[Source router]
    B --> C[NCBI esearch<br/>term + filters]
    B --> D[CT.gov v2 search<br/>query.term + filters]
    B --> E[Direct PMID list]
    B --> F[Direct NCT ID list]
    C --> G[NCBI efetch<br/>XML batches of 100]
    G --> H[Parse PubmedArticle]
    D --> I[Parse studies]
    E --> G
    F --> J[CT.gov single study]
    H --> K{Enrichment toggles?}
    K -->|fetchAbstracts| L[Full abstract text]
    K -->|fetchMeshTerms| M[MeSH headings + qualifiers]
    K -->|fetchReferences| N[ELink refs + citedin]
    K -->|always on| O[iCite citation counts +<br/>relative citation ratio]
    H --> P[(One row per paper)]
    I --> Q[(One row per trial)]
    J --> Q
    O --> P
````

PubMed records flow through E-utilities (esearch returns PMIDs, efetch returns XML). ClinicalTrials.gov records come from the v2 REST API (JSON, paginated by token). Both sources are public and free at the API level. iCite citation counts are pulled from the NIH OPB API and joined to PubMed rows automatically.

***

### What you get per row

```mermaid
flowchart LR
    P[Paper row] --> P1[Identity<br/>pmid doi pmcid]
    P --> P2[Title + abstract]
    P --> P3[Authors<br/>names, affiliations, ORCIDs]
    P --> P4[Journal<br/>title ISO ISSN volume issue pages]
    P --> P5[Dates<br/>publicationDate publicationYear]
    P --> P6[Topics<br/>meshTerms keywords]
    P --> P7[Funding<br/>grants by agency]
    P --> P8[Citations<br/>citationCount RCR via iCite]
    T[Trial row] --> T1[Identity<br/>nctId url]
    T --> T2[Status + dates]
    T --> T3[Sponsors<br/>lead + collaborators + class]
    T --> T4[Design<br/>phase studyType allocation masking]
    T --> T5[Cohort<br/>enrollment eligibility sex age]
    T --> T6[Conditions + interventions]
    T --> T7[Outcomes<br/>primary + secondary + timeFrames]
    T --> T8[Locations<br/>facility city country status]
    T --> T9[Results section<br/>when posted]
```

PMIDs and NCT IDs are stable identifiers. The actor dedupes across runs by both, so a daily cron pulls only new records.

***

### Quick start

**Track new oncology trials this week**

```json
{
  "clinicalTrialsQueries": ["non small cell lung cancer"],
  "studyStatus": ["RECRUITING", "NOT_YET_RECRUITING"],
  "phases": ["PHASE2", "PHASE3"],
  "dateFrom": "2026-04-29",
  "maxRecords": 200
}
```

**Daily PubMed feed for a therapeutic area**

```json
{
  "pubmedQueries": ["GLP-1 receptor agonist obesity"],
  "publicationTypes": ["Clinical Trial", "Randomized Controlled Trial", "Meta-Analysis"],
  "dateFrom": "2026-04-01",
  "fetchAbstracts": true,
  "fetchMeshTerms": true,
  "maxRecords": 100
}
```

**KOL mapping by topic, with citation impact**

```json
{
  "pubmedQueries": ["CAR-T cell therapy"],
  "publicationTypes": ["Review", "Clinical Trial"],
  "dateFrom": "2024-01-01",
  "fetchAbstracts": true,
  "fetchMeshTerms": true,
  "fetchReferences": false,
  "maxRecords": 500
}
```

**Direct NCT ID enrichment for a watchlist**

```json
{
  "nctIds": ["NCT05123456", "NCT04999111", "NCT05432109"],
  "fetchTrialResults": true
}
```

**Build a reference pack from a list of PMIDs**

```json
{
  "pmids": ["38523054", "39122189", "37956789"],
  "fetchAbstracts": true,
  "fetchMeshTerms": true,
  "fetchReferences": true
}
```

**Cross domain pull: papers + trials in one run**

```json
{
  "pubmedQueries": ["lecanemab alzheimer"],
  "clinicalTrialsQueries": ["lecanemab"],
  "fetchAbstracts": true,
  "fetchTrialResults": true,
  "maxRecords": 250
}
```

***

### Sample output

PubMed paper row:

```json
{
  "type": "pubmed",
  "pmid": "38523054",
  "doi": "10.1056/NEJMoa2304146",
  "pmcid": "PMC10923512",
  "title": "Lecanemab in Early Alzheimer's Disease",
  "abstract": "BACKGROUND: The accumulation of soluble and insoluble aggregated amyloid-beta...",
  "authors": [
    {
      "name": "Christopher H van Dyck",
      "lastName": "van Dyck",
      "foreName": "Christopher H",
      "affiliations": ["Yale School of Medicine, New Haven, CT"],
      "orcid": "0000-0002-1234-5678"
    }
  ],
  "journal": "The New England Journal of Medicine",
  "journalIso": "N Engl J Med",
  "issn": "1533-4406",
  "volume": "388",
  "issue": "1",
  "pages": "9-21",
  "publicationYear": 2023,
  "publicationDate": "2023-Jan-5",
  "publicationTypes": ["Journal Article", "Randomized Controlled Trial"],
  "meshTerms": [
    { "term": "Alzheimer Disease", "ui": "D000544", "major": true, "qualifiers": ["drug therapy"] },
    { "term": "Amyloid beta-Peptides", "ui": "D016229", "major": false, "qualifiers": [] }
  ],
  "keywords": ["amyloid", "monoclonal antibody"],
  "grants": [
    { "grantId": "U01 AG006781", "agency": "NIA NIH HHS", "country": "United States" }
  ],
  "language": "eng",
  "url": "https://pubmed.ncbi.nlm.nih.gov/38523054/",
  "citationCount": 1842,
  "relativeCitationRatio": 24.3,
  "fieldCitationRate": 12.1,
  "scrapedAt": "2026-05-06T10:30:00.000Z"
}
```

Clinical trial row:

```json
{
  "type": "clinical_trial",
  "nctId": "NCT03887455",
  "title": "A Study to Confirm Safety and Efficacy of Lecanemab in Participants With Early Alzheimer's Disease",
  "url": "https://clinicaltrials.gov/study/NCT03887455",
  "status": "ACTIVE_NOT_RECRUITING",
  "startDate": "2019-03-22",
  "primaryCompletionDate": "2022-09-29",
  "completionDate": "2027-10-15",
  "studyType": "INTERVENTIONAL",
  "phases": ["PHASE3"],
  "enrollment": 1795,
  "enrollmentType": "ACTUAL",
  "primaryPurpose": "TREATMENT",
  "leadSponsor": "Eisai Inc.",
  "leadSponsorClass": "INDUSTRY",
  "collaborators": ["Biogen"],
  "conditions": ["Alzheimer Disease", "Early Alzheimer's Disease"],
  "interventions": [
    { "type": "DRUG", "name": "Lecanemab", "description": "10 mg/kg biweekly IV", "otherNames": ["BAN2401"] }
  ],
  "primaryOutcomes": [
    { "measure": "Change from Baseline in CDR-SB at 18 Months", "timeFrame": "Baseline to 18 months" }
  ],
  "locations": [
    { "facility": "Yale School of Medicine", "city": "New Haven", "state": "Connecticut", "country": "United States", "status": "ACTIVE_NOT_RECRUITING" }
  ],
  "locationCount": 234,
  "hasResults": true,
  "scrapedAt": "2026-05-06T10:30:00.000Z"
}
```

***

### Who uses this

| Role | Use case |
|---|---|
| Pharma CI team | Daily feed of new trials in a therapeutic area, with sponsor and phase, mapped against your portfolio |
| Biotech analyst | Track when a competitor's trial moves from Phase 2 to Phase 3, or posts results |
| Regulatory affairs | Pull every paper citing a specific MeSH term in the last quarter for an FDA submission |
| Medical writer | Build a systematic review reference pack from a query, export with full abstracts and DOIs |
| KOL mapper | Find the top 50 authors by citation impact in a niche, cross referenced to their trial sites |
| CRO BD | Identify active investigators by location and condition for site recruitment |
| Science journalist | Verify a viral health claim against the primary trial result and citing literature |
| AI / LLM team | Build biomedical training corpora with structured MeSH terms, abstracts, and outcome data |
| Grant writer | Pull recent funded papers in your topic, complete with NIH grant IDs and agency names |
| Patent attorney | Prior art sweep across PubMed papers and trial registrations on a drug candidate |

***

### Input reference

| Field | Type | What it does |
|---|---|---|
| `pubmedQueries` | string\[] | PubMed Entrez queries. Supports MeSH and field tags: `"breast cancer"[MeSH]`, `pembrolizumab[Title]`. |
| `clinicalTrialsQueries` | string\[] | Free text queries against ClinicalTrials.gov. Matches title, conditions, interventions, sponsor. |
| `pmids` | string\[] | Direct PubMed IDs to fetch. Skips search. |
| `nctIds` | string\[] | Direct ClinicalTrials.gov NCT numbers to fetch. |
| `dateFrom` / `dateTo` | string | ISO date window. PubMed: publication date. CT.gov: lastUpdatePostDate. |
| `publicationTypes` | string\[] | PubMed publication type filter. Common: Clinical Trial, Meta-Analysis, Review. |
| `studyStatus` | enum\[] | Trial recruitment status filter. |
| `phases` | enum\[] | Trial phase filter. |
| `studyTypes` | enum\[] | Interventional, observational, expanded access. |
| `fetchAbstracts` | boolean | Include full abstract text in PubMed rows. On by default. |
| `fetchMeshTerms` | boolean | Parse MeSH headings with UIs and qualifiers. On by default. |
| `fetchReferences` | boolean | Per paper, fetch reference list and citing PMID list via ELink. Off by default. |
| `fetchTrialResults` | boolean | Include posted results section for completed trials. On by default. |
| `maxRecords` | integer | Hard cap on rows per run. 0 means unlimited. |
| `maxPerQuery` | integer | Cap per individual query before moving to the next. |
| `ncbiApiKey` | string | NCBI API key for 10 req/s instead of 3 req/s. Recommended for runs over 500 records. |
| `email` | string | Identifying email for the User-Agent header. NCBI requests this. |
| `dedupe` | boolean | Skip PMIDs and NCT IDs already pushed in previous runs. |
| `navigationDelayMs` | integer | Pause between API calls. Default 350 ms keeps you under the 3 req/s limit. |

***

### API call

```bash
curl -X POST \
  "https://api.apify.com/v2/acts/YOUR_USER~pubmed-clinical-trials-intelligence/runs?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "pubmedQueries": ["semaglutide cardiovascular"],
    "clinicalTrialsQueries": ["semaglutide"],
    "studyStatus": ["RECRUITING", "ACTIVE_NOT_RECRUITING"],
    "phases": ["PHASE3", "PHASE4"],
    "dateFrom": "2026-01-01",
    "fetchAbstracts": true,
    "maxRecords": 100
  }'
```

***

### Pricing

The first 20 rows per run are free so you can validate the schema before paying. After that, $0.005 per row pushed. PubMed papers and clinical trial rows are charged at the same rate. iCite citation counts, MeSH terms, references, and posted trial results are included at no extra per row charge.

***

### FAQ

#### Do I need an NCBI API key?

Optional but recommended for runs over 500 records. Without a key, NCBI throttles at 3 requests per second. With a free key from your NCBI account, you get 10 per second. The actor handles backoff either way.

#### Will this hit rate limits?

The default `navigationDelayMs` of 350 ms paces requests under NCBI's no key limit. ClinicalTrials.gov v2 has no published rate limit and accepts 100 records per page. If you see 429 errors, raise `navigationDelayMs` to 700 ms or add an API key.

#### Why not use BioPython or Entrez Direct?

Both are excellent for one off pulls on your laptop. This actor adds three things: ClinicalTrials.gov in the same row schema, iCite citation counts joined automatically, and dedupe across daily runs. Run it on a cron and you get an incremental feed instead of a one shot dump.

#### How current is the data?

PubMed indexes new papers within hours of journal publication. ClinicalTrials.gov updates as sponsors post changes (sometimes daily, sometimes monthly per study). Both APIs return the live record at request time.

#### Can I track when a trial changes phase or status?

Yes. Schedule the actor on a daily cron with the same query and `dedupe: false`. Each row carries `scrapedAt`, `lastUpdatePostedDate`, and `status`. Diff between snapshots to catch phase transitions, status flips, and enrollment changes.

#### What is iCite RCR?

Relative Citation Ratio. NIH's field normalized citation impact metric. RCR of 1.0 is average for the paper's field and year. RCR of 5.0 means the paper is cited 5x more than average peers. Better than raw citation count for cross field comparisons.

#### Can I get the full text of a paper?

The actor returns metadata and the structured abstract. Full text lives behind the publisher or in PubMed Central. For PMC papers, the row includes a `pmcid`. Pipe `pmcid` into Apify's Website Content Crawler against `https://www.ncbi.nlm.nih.gov/pmc/articles/{pmcid}/` for the full body.

#### Does fetchReferences work for every paper?

Only papers indexed with a structured reference list in PubMed have references via ELink. Coverage is strongest in PMC open access journals and weaker in older or non English titles. Empty `references` array means PubMed does not have the reference list, not that the paper has no references.

#### How does this dedupe?

Two key value store keys: `seen-pmids` and `seen-nct-ids`. Every successful push adds the ID. Next run skips IDs already in the set. Turn `dedupe` off to refresh stale rows or rebuild the dataset from scratch.

#### Will this scrape PubMed Central full text?

No. PMC full text is XML behind a separate API and the licensing varies per article. Use Website Content Crawler against the `pmcid` URL when full text is needed.

***

### Related actors

- **Google Scholar Scraper**. Broader academic coverage including humanities, social sciences, and working papers. Pair when your topic spans biomedical and adjacent fields.
- **Google Patents Scraper**. Same temporal and prior art shape applied to patent literature. Pairs naturally for IP teams covering pharma assets.
- **SEC 8-K Event Tracker**. Catch material events from public biotech sponsors. Pair with this actor to align trial readouts to investor disclosures.
- **SEC Form 4 Insider Tracker**. Insider trading signal around clinical milestones.
- **Website Content Crawler**. Pipe `pmcid` URLs or trial NCT URLs into the crawler for full text and supplementary documents.
- **HN Lead Monitor**. Catch new mentions of a trial sponsor or drug name on Hacker News.
- **Reddit Lead Monitor**. Same applied to patient and clinician subreddits, useful for KOL discovery and patient sentiment.

# Actor input Schema

## `pubmedQueries` (type: `array`):

Free text or PubMed Entrez queries. Supports MeSH and field tags: "breast cancer"\[MeSH], pembrolizumab\[Title], 2024\[PDAT]. Example: \["GLP-1 receptor agonist obesity", "CRISPR sickle cell"].

## `clinicalTrialsQueries` (type: `array`):

Free text queries against ClinicalTrials.gov. Matches title, conditions, interventions, and sponsor fields. Example: \["alzheimer monoclonal antibody", "pediatric ADHD"].

## `pmids` (type: `array`):

PMIDs to fetch directly. Skips search, enriches each one with full abstract, authors, MeSH terms, and references. Example: \["38523054", "39122189"].

## `nctIds` (type: `array`):

ClinicalTrials.gov NCT numbers to fetch directly. Example: \["NCT05123456", "NCT04999111"].

## `dateFrom` (type: `string`):

ISO date (YYYY-MM-DD). For PubMed: publication date lower bound. For ClinicalTrials.gov: lastUpdatePostDate lower bound. Empty means no bound.

## `dateTo` (type: `string`):

ISO date (YYYY-MM-DD). Upper bound. Empty means no bound.

## `publicationTypes` (type: `array`):

Restrict PubMed to specific publication types. Common: Clinical Trial, Randomized Controlled Trial, Meta-Analysis, Systematic Review, Review.

## `studyStatus` (type: `array`):

Restrict trials to specific recruitment statuses. Empty means all statuses.

## `phases` (type: `array`):

Restrict trials to specific phases. Empty means all phases.

## `studyTypes` (type: `array`):

Restrict trials to specific study types. Empty means all study types.

## `fetchAbstracts` (type: `boolean`):

Include the full abstract text in each PubMed row. Adds one extra efetch call per query batch. On by default since abstracts are the most valuable field.

## `fetchMeshTerms` (type: `boolean`):

Parse Medical Subject Heading terms from the PubMed XML record. Useful for topic clustering and indexing.

## `fetchReferences` (type: `boolean`):

For each PubMed paper, pull the reference list and citing PMIDs from NCBI ELink. Adds two extra requests per paper. Best with maxRecords <= 500.

## `fetchTrialResults` (type: `boolean`):

Include posted results (primary outcomes, adverse events) for completed trials. Only ~30% of trials post results. Adds the resultsSection field per row.

## `maxRecords` (type: `integer`):

Hard cap on total rows pushed (PubMed papers + clinical trials combined). 0 means unlimited.

## `maxPerQuery` (type: `integer`):

Cap per individual query before moving to the next. Stops one broad query from eating the entire budget.

## `ncbiApiKey` (type: `string`):

Free NCBI API key from https://www.ncbi.nlm.nih.gov/account. Without a key: 3 PubMed requests per second. With a key: 10 per second. Recommended for runs over 500 records.

## `email` (type: `string`):

NCBI requests an identifying email for E-utilities calls. Used in the User-Agent. No marketing.

## `dedupe` (type: `boolean`):

Skip PMIDs and NCT IDs already pushed in previous runs. Use for daily incremental scrapes. Turn off to refresh stale rows.

## `navigationDelayMs` (type: `integer`):

Pause between API calls. NCBI rate limits at 3 per second without a key. Default of 350 ms keeps you well under the limit.

## Actor input object example

```json
{
  "pubmedQueries": [],
  "clinicalTrialsQueries": [],
  "pmids": [],
  "nctIds": [],
  "dateFrom": "",
  "dateTo": "",
  "publicationTypes": [],
  "studyStatus": [],
  "phases": [],
  "studyTypes": [],
  "fetchAbstracts": true,
  "fetchMeshTerms": true,
  "fetchReferences": false,
  "fetchTrialResults": true,
  "maxRecords": 200,
  "maxPerQuery": 100,
  "email": "",
  "dedupe": true,
  "navigationDelayMs": 350
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("scrapemint/pubmed-clinical-trials-intelligence").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("scrapemint/pubmed-clinical-trials-intelligence").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call scrapemint/pubmed-clinical-trials-intelligence --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=scrapemint/pubmed-clinical-trials-intelligence",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Pharma Research & Clinical Trial Monitor",
        "description": "Pull PubMed papers and ClinicalTrials.gov studies at scale. PMIDs, DOIs, abstracts, MeSH terms, NCT IDs, phases, sponsors, enrollment, primary outcomes, results. One row per record. Pay per row.",
        "version": "0.1",
        "x-build-id": "kROvPzyZXgbUtpve5"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/scrapemint~pubmed-clinical-trials-intelligence/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-scrapemint-pubmed-clinical-trials-intelligence",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/scrapemint~pubmed-clinical-trials-intelligence/runs": {
            "post": {
                "operationId": "runs-sync-scrapemint-pubmed-clinical-trials-intelligence",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/scrapemint~pubmed-clinical-trials-intelligence/run-sync": {
            "post": {
                "operationId": "run-sync-scrapemint-pubmed-clinical-trials-intelligence",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "pubmedQueries": {
                        "title": "PubMed search queries",
                        "type": "array",
                        "description": "Free text or PubMed Entrez queries. Supports MeSH and field tags: \"breast cancer\"[MeSH], pembrolizumab[Title], 2024[PDAT]. Example: [\"GLP-1 receptor agonist obesity\", \"CRISPR sickle cell\"].",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "clinicalTrialsQueries": {
                        "title": "ClinicalTrials.gov search queries",
                        "type": "array",
                        "description": "Free text queries against ClinicalTrials.gov. Matches title, conditions, interventions, and sponsor fields. Example: [\"alzheimer monoclonal antibody\", \"pediatric ADHD\"].",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "pmids": {
                        "title": "Direct PubMed IDs",
                        "type": "array",
                        "description": "PMIDs to fetch directly. Skips search, enriches each one with full abstract, authors, MeSH terms, and references. Example: [\"38523054\", \"39122189\"].",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "nctIds": {
                        "title": "Direct NCT IDs",
                        "type": "array",
                        "description": "ClinicalTrials.gov NCT numbers to fetch directly. Example: [\"NCT05123456\", \"NCT04999111\"].",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "dateFrom": {
                        "title": "Published or updated from",
                        "type": "string",
                        "description": "ISO date (YYYY-MM-DD). For PubMed: publication date lower bound. For ClinicalTrials.gov: lastUpdatePostDate lower bound. Empty means no bound.",
                        "default": ""
                    },
                    "dateTo": {
                        "title": "Published or updated until",
                        "type": "string",
                        "description": "ISO date (YYYY-MM-DD). Upper bound. Empty means no bound.",
                        "default": ""
                    },
                    "publicationTypes": {
                        "title": "PubMed publication types",
                        "type": "array",
                        "description": "Restrict PubMed to specific publication types. Common: Clinical Trial, Randomized Controlled Trial, Meta-Analysis, Systematic Review, Review.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "studyStatus": {
                        "title": "ClinicalTrials.gov status filter",
                        "type": "array",
                        "description": "Restrict trials to specific recruitment statuses. Empty means all statuses.",
                        "items": {
                            "type": "string",
                            "enum": [
                                "RECRUITING",
                                "NOT_YET_RECRUITING",
                                "ACTIVE_NOT_RECRUITING",
                                "COMPLETED",
                                "TERMINATED",
                                "WITHDRAWN",
                                "SUSPENDED",
                                "ENROLLING_BY_INVITATION"
                            ],
                            "enumTitles": [
                                "Recruiting",
                                "Not yet recruiting",
                                "Active, not recruiting",
                                "Completed",
                                "Terminated",
                                "Withdrawn",
                                "Suspended",
                                "Enrolling by invitation"
                            ]
                        },
                        "default": []
                    },
                    "phases": {
                        "title": "Trial phases",
                        "type": "array",
                        "description": "Restrict trials to specific phases. Empty means all phases.",
                        "items": {
                            "type": "string",
                            "enum": [
                                "EARLY_PHASE1",
                                "PHASE1",
                                "PHASE2",
                                "PHASE3",
                                "PHASE4",
                                "NA"
                            ],
                            "enumTitles": [
                                "Early Phase 1",
                                "Phase 1",
                                "Phase 2",
                                "Phase 3",
                                "Phase 4",
                                "Not applicable"
                            ]
                        },
                        "default": []
                    },
                    "studyTypes": {
                        "title": "Trial study types",
                        "type": "array",
                        "description": "Restrict trials to specific study types. Empty means all study types.",
                        "items": {
                            "type": "string",
                            "enum": [
                                "INTERVENTIONAL",
                                "OBSERVATIONAL",
                                "EXPANDED_ACCESS"
                            ],
                            "enumTitles": [
                                "Interventional",
                                "Observational",
                                "Expanded access"
                            ]
                        },
                        "default": []
                    },
                    "fetchAbstracts": {
                        "title": "Include full abstracts",
                        "type": "boolean",
                        "description": "Include the full abstract text in each PubMed row. Adds one extra efetch call per query batch. On by default since abstracts are the most valuable field.",
                        "default": true
                    },
                    "fetchMeshTerms": {
                        "title": "Include MeSH terms",
                        "type": "boolean",
                        "description": "Parse Medical Subject Heading terms from the PubMed XML record. Useful for topic clustering and indexing.",
                        "default": true
                    },
                    "fetchReferences": {
                        "title": "Include references and cited by",
                        "type": "boolean",
                        "description": "For each PubMed paper, pull the reference list and citing PMIDs from NCBI ELink. Adds two extra requests per paper. Best with maxRecords <= 500.",
                        "default": false
                    },
                    "fetchTrialResults": {
                        "title": "Include trial results when available",
                        "type": "boolean",
                        "description": "Include posted results (primary outcomes, adverse events) for completed trials. Only ~30% of trials post results. Adds the resultsSection field per row.",
                        "default": true
                    },
                    "maxRecords": {
                        "title": "Max records per run",
                        "minimum": 0,
                        "maximum": 100000,
                        "type": "integer",
                        "description": "Hard cap on total rows pushed (PubMed papers + clinical trials combined). 0 means unlimited.",
                        "default": 200
                    },
                    "maxPerQuery": {
                        "title": "Max records per query",
                        "minimum": 1,
                        "maximum": 10000,
                        "type": "integer",
                        "description": "Cap per individual query before moving to the next. Stops one broad query from eating the entire budget.",
                        "default": 100
                    },
                    "ncbiApiKey": {
                        "title": "NCBI API key (optional)",
                        "type": "string",
                        "description": "Free NCBI API key from https://www.ncbi.nlm.nih.gov/account. Without a key: 3 PubMed requests per second. With a key: 10 per second. Recommended for runs over 500 records."
                    },
                    "email": {
                        "title": "Contact email (optional)",
                        "type": "string",
                        "description": "NCBI requests an identifying email for E-utilities calls. Used in the User-Agent. No marketing.",
                        "default": ""
                    },
                    "dedupe": {
                        "title": "Deduplicate across runs",
                        "type": "boolean",
                        "description": "Skip PMIDs and NCT IDs already pushed in previous runs. Use for daily incremental scrapes. Turn off to refresh stale rows.",
                        "default": true
                    },
                    "navigationDelayMs": {
                        "title": "Delay between requests (ms)",
                        "minimum": 0,
                        "maximum": 10000,
                        "type": "integer",
                        "description": "Pause between API calls. NCBI rate limits at 3 per second without a key. Default of 350 ms keeps you well under the limit.",
                        "default": 350
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
