ClinicalTrials.gov Sponsor Pipeline Scraper avatar

ClinicalTrials.gov Sponsor Pipeline Scraper

Pricing

from $10.00 / 1,000 results

Go to Apify Store
ClinicalTrials.gov Sponsor Pipeline Scraper

ClinicalTrials.gov Sponsor Pipeline Scraper

Scrape ClinicalTrials.gov API v2 by sponsor, condition, phase, and recruitment status. Returns one digest row per saved query with study-level evidence — for clinical landscape research and sponsor pipeline analytics. No email or contact fields emitted (Terms of Use compliant).

Pricing

from $10.00 / 1,000 results

Rating

0.0

(0)

Developer

太郎 山田

太郎 山田

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Categories

Share

This actor is intended for research and analysis purposes only. Data must not be used for unrequested direct messaging to sponsors, investigators, or individual contacts. Comply with ClinicalTrials.gov Terms of Use and any applicable institutional policies.

Track sponsor pipelines, recruiting status changes, trial phases, and condition-specific activity directly from ClinicalTrials.gov. Use recurring runs to monitor public study records for research, investment diligence, CRO planning, competitive intelligence, or clinical landscape analysis.

When you run the scraper, you get clean digest rows with NCT IDs, official study titles, sponsor names, conditions, trial phase, recruitment status, new-study flags, watch-term hits, and action-needed summaries. The actor does not extract or promote personal contact fields.

Changelog

  • v0.2 compliance update — Removed public messaging around personal-contact extraction and direct messaging. Output is positioned for research and pipeline analysis only.

Store Quickstart

Run this actor with your target input. Results appear in the Apify Dataset and can be piped to webhooks for real-time delivery. Use dryRun to validate before committing to a schedule.

Key Features

  • 📈 Sponsor pipeline tracking — Group public trial records by sponsor, condition, phase, and status
  • 📊 Recruitment change detection — Flag newly recruiting studies and status changes between scheduled runs
  • 🎯 Watchlist queries — Monitor condition, sponsor, institution, phase, and geography filters
  • 📡 Webhook delivery — Send research digests to analytics or operations systems

Use Cases

WhoWhy
DevelopersAutomate recurring data fetches without building custom scrapers
Data teamsPipe structured output into analytics warehouses
Ops teamsMonitor changes via webhook alerts
Product managersTrack competitor/market signals without engineering time

Input

FieldTypeDefaultDescription
watchlistarrayrequiredOne entry per monitored query. At minimum set id, name, and condition. Add recruitmentStatus, phase, intervention, or sponsor to narrow.
watchTermsstringComma-separated sponsor / PI / institution names to flag in study digests. Any matching study receives a watch_term_hit signal tag.
maxStudiesPerQueryinteger50Upper bound on studies fetched per query per run. Increase for one-off discovery; keep low for recurring digest runs.
deliverystring"dataset"dataset stores results in the Apify dataset. webhook posts the digest JSON to webhookUrl.
webhookUrlstringPOST target for trial digest payload. Leave empty for dataset delivery.
datasetModestring"all"all emits every query digest row. action_needed emits only queries with watch-term hits or new recruiting studies. new_only emits only queries with studies not seen in the previous run.
snapshotKeystring"clinical-trials-monitor-state"Stable key used to persist seen NCT IDs across recurring runs. Use the same key across scheduled runs.
clinicalTrialsApiUrlstring"https://clinicaltrials.gov/api/v2/studies"ClinicalTrials.gov API v2 studies endpoint. No API key required.
requestTimeoutSecondsinteger30HTTP request timeout.
notifyOnNoNewbooleantrueWhen true, every query produces a digest row even if no new studies were found.
dryRunbooleanfalseValidate and fetch without persisting state or posting webhooks.

Example 1 — single oncology watchlist with sponsor watch terms

{
"watchlist": [
{
"id": "nsclc-phase3-recruiting",
"name": "NSCLC Phase 3 — Recruiting",
"condition": "non-small cell lung cancer",
"recruitmentStatus": "RECRUITING",
"phase": "PHASE3,PHASE4"
}
],
"watchTerms": "Pfizer, AstraZeneca, Novo Nordisk",
"maxStudiesPerQuery": 50,
"delivery": "dataset",
"datasetMode": "all"
}

Example 2 — sponsor portfolio across two indications, action-needed only

{
"watchlist": [
{
"id": "merck-onc-recruiting",
"name": "Merck — Oncology Recruiting",
"condition": "cancer",
"sponsor": "Merck",
"recruitmentStatus": "RECRUITING"
},
{
"id": "merck-vax-active",
"name": "Merck — Vaccines Active",
"condition": "vaccine",
"sponsor": "Merck",
"recruitmentStatus": "ACTIVE_NOT_RECRUITING,RECRUITING"
}
],
"watchTerms": "Merck, MSD, Merck Sharp & Dohme",
"maxStudiesPerQuery": 100,
"delivery": "dataset",
"datasetMode": "action_needed"
}

Example 3 — webhook delivery to a research-team listener (new studies only)

{
"watchlist": [
{
"id": "obesity-glp1",
"name": "Obesity GLP-1",
"condition": "obesity",
"intervention": "GLP-1",
"recruitmentStatus": "RECRUITING"
}
],
"watchTerms": "Novo Nordisk, Eli Lilly",
"maxStudiesPerQuery": 80,
"delivery": "webhook",
"webhookUrl": "https://your-listener.example.com/clinical-trials",
"datasetMode": "new_only"
}

Output

FieldTypeDescription
metaobject
errorsarray
digestsarray
digests[].queryIdstring
digests[].queryNamestring
digests[].conditionstring
digests[].recruitmentStatusFilterarray
digests[].checkedAttimestamp
digests[].statusstring
digests[].newStudyCountnumber
digests[].totalStudyCountnumber
digests[].recruitingCountnumber
digests[].changedSinceLastRunboolean
digests[].actionNeededboolean
digests[].recommendedActionstring
digests[].topSponsorsarray
digests[].watchTermHitsarray
digests[].signalTagsarray
digests[].studiesarray
digests[].errornull

Output Example

{
"meta": {
"generatedAt": "2026-04-15T09:00:00.000Z",
"now": "2026-04-15T09:00:00.000Z",
"queryCount": 2,
"totalStudies": 7,
"newStudies": 4,
"watchTermHitCount": 2,
"actionNeededCount": 1,
"snapshot": {
"key": "clinical-trials-monitor-sample",
"loadedFrom": "local",
"savedTo": "local"
},
"warnings": [],
"executiveSummary": {
"overallStatus": "action_needed",
"brief": "1 query(s) have sponsor watch-term hits requiring review.",
"topSponsors": [
{
"name": "Pfizer Inc",
"studyCount": 2,
"isWatchTermHit": true
},
{
"name": "Novo Nordisk A/S",
"studyCount": 1,
"isWatchTermHit": true
},
{
"name": "AstraZeneca",
"studyCount": 2,
"isWatchTermHit": false
}
],
"watchTermHits": [
{
"term": "Pfizer",
"studyId": "NCT05001234",
"sponsor": "Pfizer Inc",
"title": "Study of [Drug] in Advanced NSCLC",
"phase": "PHASE3"
}
]
}
}
}

No email or contact-detail fields are emitted. This is intentional and aligned with ClinicalTrials.gov Terms; see docs/source-compliance.md.

API Usage

Run this actor programmatically using the Apify API. Replace YOUR_API_TOKEN with your token from Apify Console → Settings → Integrations.

cURL

curl -X POST "https://api.apify.com/v2/acts/taroyamada~clinical-trials-pipeline-monitor/run-sync-get-dataset-items?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{ "watchlist": [{ "id": "demo", "name": "Diabetes — Recruiting", "condition": "diabetes", "recruitmentStatus": "RECRUITING" }], "maxStudiesPerQuery": 50, "delivery": "dataset" }'

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("taroyamada/clinical-trials-pipeline-monitor").call(run_input={
"watchlist": [{
"id": "demo",
"name": "Diabetes — Recruiting",
"condition": "diabetes",
"recruitmentStatus": "RECRUITING"
}],
"maxStudiesPerQuery": 50,
"delivery": "dataset"
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)

JavaScript / Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('taroyamada/clinical-trials-pipeline-monitor').call({
watchlist: [{ id: 'demo', name: 'Diabetes — Recruiting', condition: 'diabetes', recruitmentStatus: 'RECRUITING' }],
maxStudiesPerQuery: 50,
delivery: 'dataset',
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);

Tips

  • Run weekly for trend tracking; daily for catalyst-event monitoring.
  • Use webhook delivery to push digests into research-team channels (Slack, Teams) for review — not for unsolicited contact, see compliance note above.
  • Archive results in the Apify Dataset for your own historical trend analysis.
  • Start with a small watchlist; iterate on condition and recruitmentStatus precision before scaling.

FAQ

Does this scrape the ClinicalTrials.gov website HTML?

No. It uses the official clinicaltrials.gov/api/v2/studies JSON API. No API key is required.

How is data deduplicated across runs?

The actor persists seen NCT IDs by snapshotKey. Use the same key across scheduled runs to make new_only and action_needed modes meaningful.

Why are there no email fields?

ClinicalTrials.gov Terms prohibit using email addresses from study records for marketing or promotional purposes. To stay compliant by design, this actor emits no email field. See docs/source-compliance.md for the full source-compliance record.

Can I get sponsor name normalisation?

Sponsor canonicalisation (e.g., "Merck Sharp & Dohme" / "MSD" / "Merck & Co." reconciled) is on the v0.3 roadmap.

Can I run this on a schedule?

Yes — use Apify's scheduling UI, or trigger via the API on your own cron. The actor is designed to deduplicate against snapshotKey so recurring runs only highlight new or changed studies.

Public-data B2B research cluster — adjacent Apify scrapers from this account:

Cost

Pay Per Event:

  • actor-start: $0.01 (flat fee per run)
  • dataset-item: $0.003 per output item

Example: 1,000 items = $0.01 + (1,000 × $0.003) = $3.01

No subscription required — you only pay for what you use.

⭐ Was this helpful?

If this actor saved you time, please leave a ★ rating on Apify Store. It takes 10 seconds, helps other developers discover it, and keeps updates free.

Bug report or feature request? Open an issue on the Issues tab of this actor.