ClinicalTrials.gov Scraper avatar

ClinicalTrials.gov Scraper

Pricing

from $3.00 / 1,000 results

Go to Apify Store
ClinicalTrials.gov Scraper

ClinicalTrials.gov Scraper

Scrape the US government clinical-trial registry (500K+ studies). Search by condition, intervention, location, sponsor, or NCT ID; filter by status, phase, study type, demographics, country, and dates. Public REST API, no auth, no proxy.

Pricing

from $3.00 / 1,000 results

Rating

5.0

(7)

Developer

Crawler Bros

Crawler Bros

Maintained by Community

Actor stats

7

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

Scrape the US government clinical-trial registry — 500,000+ studies spanning interventional trials, observational studies, and expanded-access programs across 220+ countries. Search by free-text, condition, intervention, location, or sponsor; look up by NCT ID; filter by recruitment status, phase, study type, demographic eligibility, and date ranges. HTTP-only via the public ClinicalTrials.gov REST API v2 — no authentication, no cookies, no proxy required.

What this actor does

  • Six modes: search, byNctIds, byCondition, byIntervention, byLocation, bySponsor
  • Server-side filters: status, phase, study type, country (passed directly to the API)
  • Client-side filters: sex, age groups, start / completion date ranges, has-results, keyword substring
  • Cursor pagination: scales to thousands of records per run
  • Polite-pool: identifies itself via User-Agent header with contact email
  • Empty fields are omitted — no null, "", [], or {} in the output

Output per study (sample)

{
"nctId": "NCT04280705",
"briefTitle": "Adaptive COVID-19 Treatment Trial (ACTT)",
"officialTitle": "A Multicenter, Adaptive, Randomized Blinded Controlled Trial...",
"organization": "National Institute of Allergy and Infectious Diseases (NIAID)",
"overallStatus": "COMPLETED",
"startDate": "2020-02-21",
"completionDate": "2020-05-21",
"leadSponsor": "National Institute of Allergy and Infectious Diseases (NIAID)",
"conditions": ["COVID-19"],
"keywords": ["Adaptive", "COVID-19", "Efficacy", "Multicenter"],
"studyType": "INTERVENTIONAL",
"phases": ["PHASE3"],
"enrollmentCount": 1062,
"allocation": "RANDOMIZED",
"interventionModel": "PARALLEL",
"primaryPurpose": "TREATMENT",
"interventionNames": ["Placebo", "Remdesivir"],
"interventionTypes": ["OTHER", "DRUG"],
"locationCountries": ["United States", "Japan", "Korea, Republic of", "..."],
"locationCount": 73,
"hasResults": true,
"studyUrl": "https://clinicaltrials.gov/study/NCT04280705",
"recordType": "clinicalTrial",
"scrapedAt": "2026-05-21T11:53:00+00:00"
}

Available fields

  • Identification: nctId, briefTitle, officialTitle, acronym, organization, organizationClass, orgStudyId
  • Status & dates: overallStatus, startDate, startDateType, primaryCompletionDate, completionDate, studyFirstSubmitDate, studyFirstPostDate, lastUpdateSubmitDate, lastUpdatePostDate, resultsFirstPostDate, statusVerifiedDate, hasExpandedAccess
  • Sponsors: leadSponsor, leadSponsorClass, collaborators[], responsiblePartyType, responsiblePartyInvestigator
  • Oversight: isFdaRegulatedDrug, isFdaRegulatedDevice, hasDataMonitoringCommittee
  • Description: briefSummary, detailedDescription
  • Conditions: conditions[], keywords[]
  • Design: studyType, phases[], enrollmentCount, enrollmentType, allocation, interventionModel, primaryPurpose, masking
  • Arms / interventions: armGroups[], interventions[], interventionNames[], interventionTypes[]
  • Eligibility: eligibilityCriteria, sex, minimumAge, maximumAge, standardAges[], healthyVolunteers
  • Outcomes: primaryOutcomes[], secondaryOutcomes[]
  • Locations: locations[], locationCities[], locationCountries[], locationCount, overallOfficials[]
  • Results: hasResults
  • Derived: studyUrl, recordType, scrapedAt

Input

FieldTypeDefaultDescription
modestringsearchOne of search, byNctIds, byCondition, byIntervention, byLocation, bySponsor
searchQuerystringcancer immunotherapyFree-text query (mode=search; also fallback for other modes)
conditionQuerystringDisease / condition (mode=byCondition)
interventionQuerystringDrug / device / therapy (mode=byIntervention)
locationQuerystringGeographic text (mode=byLocation)
sponsorQuerystringLead sponsor name (mode=bySponsor)
nctIdsarrayList of NCT IDs (mode=byNctIds)
statusenumOverall recruitment status (Recruiting, Completed, etc.)
studyTypeenumInterventional / Observational / Expanded Access
phasesarray enumOne or more of Early Phase 1 / Phase 1 / Phase 2 / Phase 3 / Phase 4 / NA
sexenumEligible sex (All / Male / Female)
ageGroupsarray enumOne or more of Child / Adult / Older Adult
countryenumCountry with a trial location
startDateFrom / startDateTostringISO date bounds for study start
completionDateFrom / completionDateTostringISO date bounds for completion
hasResultsboolfalseOnly studies with results posted
containsKeywordstringCase-insensitive substring filter
maxItemsint50Hard cap (1–5000)

Example: actively recruiting Phase 3 cancer trials in the US

{
"mode": "byCondition",
"conditionQuery": "breast cancer",
"status": "RECRUITING",
"phases": ["PHASE3"],
"country": "United States",
"maxItems": 200
}

Example: lookup by NCT IDs

{
"mode": "byNctIds",
"nctIds": ["NCT04280705", "NCT03695107", "NCT04611802"]
}

Example: every metformin trial since 2020 with results posted

{
"mode": "byIntervention",
"interventionQuery": "metformin",
"startDateFrom": "2020",
"hasResults": true,
"maxItems": 500
}

Use cases

  • Pharmaceutical intelligence — track competitive trials by drug or sponsor
  • Clinical research — discover trials matching a specific patient profile
  • Health-tech startups — surface relevant trials by condition or geography
  • Academic research — bulk-export the trial landscape for a disease
  • Regulatory analysts — monitor FDA-regulated trials by status and phase
  • Patient advocacy groups — find recruiting studies for rare diseases
  • Biostatistics — feed structured trial metadata into meta-analysis pipelines

FAQ

What is ClinicalTrials.gov? The world's largest publicly accessible registry of clinical trials, run by the US National Library of Medicine (NIH). It contains over half a million studies from 220+ countries, with information sourced from trial sponsors and investigators. See clinicaltrials.gov.

Do I need an account or API key? No. The v2 REST API is fully public and free.

How fresh is the data? Trial sponsors submit updates continuously; ClinicalTrials.gov re-indexes the API in near-real-time. Most records are updated within hours of sponsor submission.

What's an NCT ID? A unique 11-character identifier (e.g., NCT04280705) assigned to every registered trial. Each NCT ID maps to one study record.

What's the difference between studyType values? INTERVENTIONAL = traditional clinical trial with an assigned treatment; OBSERVATIONAL = researchers observe but don't assign treatment; EXPANDED_ACCESS = compassionate-use programs outside of a clinical trial.

What do the phase labels mean? EARLY_PHASE1 and PHASE1 = first-in-human safety; PHASE2 = efficacy + dosing; PHASE3 = large-scale comparison vs. standard of care; PHASE4 = post-marketing surveillance. NA = not applicable (e.g., observational studies, device trials).

Why are some date fields like 2024-04 (no day) instead of 2024-04-15? Some sponsors register their trials with month-precision dates. The actor preserves whatever precision the upstream record has.

What's the difference between startDate and studyFirstPostDate? startDate is when participant enrollment begins; studyFirstPostDate is when the trial was first registered on ClinicalTrials.gov.

How are locations structured? Each trial may run at multiple sites. The actor emits a flat locations[] array (with facility, city, state, zip, country, lat/lon) plus derived locationCountries[], locationCities[], and locationCount for quick filtering and aggregation.

Can I filter by multiple phases at once? Yes — phases is an array. Selecting ["PHASE2", "PHASE3"] returns trials in either phase (server-side OR).

What if a study has no results posted yet? The hasResults field will be absent or false. Use hasResults: true in input to keep only studies with posted results.

Are NCT-ID lookups deduplicated? Yes — duplicate IDs in the input list are silently merged.

Does this actor respect ClinicalTrials.gov's terms? Yes. The actor sends a polite User-Agent header with a contact email, includes small inter-request delays, and uses the official public REST API. No bulk data dumps are produced — only the data the user explicitly queries.