ClinicalTrials.gov Study Crawler avatar

ClinicalTrials.gov Study Crawler

Pricing

Pay per event

Go to Apify Store
ClinicalTrials.gov Study Crawler

ClinicalTrials.gov Study Crawler

Crawl 500K+ clinical trial records from ClinicalTrials.gov via the v2 API. Extract study details, conditions, interventions, sponsors, phases, enrollment, eligibility, outcomes, and locations. Filter by condition, intervention, sponsor, phase, status, and study type.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

1

Monthly active users

9 days ago

Last modified

Share

ClinicalTrials.gov Clinical Study Data Crawler

Extract structured clinical trial records from ClinicalTrials.gov via the official v2 REST API. The database covers 500K+ studies — conditions, interventions, sponsors, phases, enrollment figures, eligibility criteria, outcomes, and study locations with contact details.

ClinicalTrials.gov Crawler Features

  • Filters by condition or disease, intervention name, lead sponsor, trial phase, study status, study type, and general keyword — combine any of them
  • Fetches 1,000 records per API call using cursor-based pagination, so large result sets do not require hundreds of round trips
  • Covers all 8 study statuses: RECRUITING, COMPLETED, ACTIVE_NOT_RECRUITING, SUSPENDED, TERMINATED, and more
  • Covers all trial phases from Early Phase 1 through Phase 4, plus Not Applicable
  • Extracts 25+ fields per study including full eligibility criteria text and per-location contact information
  • Queries the official v2 JSON API — no HTML parsing, no fragile selectors
  • Requires no authentication and no proxy — ClinicalTrials.gov is a U.S. government service
  • Rate-limited to ~7.7 requests per second, comfortably under the documented 10/sec ceiling

Who Uses ClinicalTrials.gov Data and Why?

  • Pharma and biotech researchers — track competitor trials, map pipeline activity by indication, and monitor phase progression across therapeutic areas
  • Clinical research organizations — identify actively recruiting trials by condition and geography to support site selection and patient referral
  • Investment analysts — map development pipelines for biotech companies by pulling every active or completed study tied to a specific sponsor
  • Patient advocates — find open recruiting studies for a given condition, filtered by geography and eligibility parameters
  • Academic epidemiologists — analyze enrollment trends, study design patterns, and outcome measures across thousands of trials at once

How ClinicalTrials.gov Crawler Works

  1. You provide at least one filter: a condition name, an intervention, a sponsor, a phase, a status, a study type, or a free-text keyword. Combining multiple filters is supported.
  2. The crawler builds a query against the ClinicalTrials.gov v2 API and fetches the first page of up to 1,000 results.
  3. It follows the nextPageToken cursor through subsequent pages until it reaches your maxItems limit or exhausts the result set.
  4. Each API response is transformed into a flat, structured record and saved to the Apify dataset.

Input

Basic: recruiting breast cancer trials in Phase 3

{
"condition": "breast cancer",
"phase": "PHASE3",
"studyStatus": ["RECRUITING"],
"maxItems": 500
}
{
"sponsor": "Pfizer",
"studyType": "INTERVENTIONAL",
"maxItems": 200
}
{
"intervention": "pembrolizumab",
"studyStatus": ["COMPLETED"],
"maxItems": 100
}

Input Parameters

FieldTypeDefaultDescription
conditionstring""Condition or disease being studied (e.g. "breast cancer", "Alzheimer", "Type 2 Diabetes").
interventionstring""Intervention name (e.g. "pembrolizumab", "radiation therapy"). Matches drug names, devices, and procedures.
sponsorstring""Lead sponsor name (e.g. "Pfizer", "National Cancer Institute"). Partial match supported.
phasestring""Trial phase. Options: EARLY_PHASE1, PHASE1, PHASE2, PHASE3, PHASE4, NA. Leave empty for all phases.
studyStatusstring[][]One or more statuses: RECRUITING, ACTIVE_NOT_RECRUITING, COMPLETED, ENROLLING_BY_INVITATION, NOT_YET_RECRUITING, SUSPENDED, TERMINATED, WITHDRAWN.
studyTypestring""Study type. Options: INTERVENTIONAL, OBSERVATIONAL, EXPANDED_ACCESS. Leave empty for all.
keywordstring""General keyword search across all study fields. Use for broad or exploratory queries.
maxItemsinteger200Maximum records to return. Set to 0 for unlimited — requires at least one filter when doing so.
proxyConfigurationobjectdisabledProxy settings. Not required — ClinicalTrials.gov does not have anti-bot measures.

ClinicalTrials.gov Crawler Output Fields

{
"nct_id": "NCT02625935",
"study_title": "A Prospective Observational Study Evaluating Treatment Decision Impact of Prosigna",
"brief_summary": "This study evaluates whether the Prosigna assay changes treatment decisions for early-stage breast cancer patients...",
"study_status": "COMPLETED",
"phase": "PHASE3",
"study_type": "OBSERVATIONAL",
"conditions": ["Breast Cancer"],
"interventions": ["Prosigna Breast Cancer Prognostic Gene Signature Assay"],
"intervention_types": ["DIAGNOSTIC_TEST"],
"lead_sponsor": "NanoString Technologies, Inc.",
"lead_sponsor_type": "INDUSTRY",
"collaborators": ["American Society of Clinical Oncology"],
"enrollment_count": 201,
"enrollment_type": "ACTUAL",
"start_date": "2015-10",
"primary_completion_date": "2017-06",
"completion_date": "2017-06",
"primary_outcome": "Change in treatment recommendation (12 months)",
"secondary_outcomes": [
"Patient anxiety levels (6 months)",
"Physician confidence in treatment decision (12 months)"
],
"eligibility_criteria": "Inclusion Criteria:\n- Female\n- Diagnosed with early-stage, hormone receptor-positive breast cancer...\n\nExclusion Criteria:\n- Prior chemotherapy...",
"min_age": "18 Years",
"max_age": "",
"sex": "FEMALE",
"locations": [
{
"facility": "Memorial Sloan Kettering Cancer Center",
"city": "New York",
"state": "New York",
"country": "United States",
"zip": "10065",
"contact_name": "Dr. Jane Smith",
"contact_phone": "212-555-0100",
"contact_email": "smith@mskcc.org"
}
],
"has_results": true,
"results_first_posted": "2018-03-15",
"last_update_posted": "2023-01-10",
"study_url": "https://clinicaltrials.gov/study/NCT02625935"
}
FieldTypeDescription
nct_idstringClinicalTrials.gov identifier (e.g. NCT00000001)
study_titlestringOfficial study title
brief_summarystringBrief summary of the study purpose and design
study_statusstringOverall study status: RECRUITING, COMPLETED, ACTIVE_NOT_RECRUITING, etc.
phasestringTrial phase: PHASE1, PHASE2, PHASE3, PHASE4, EARLY_PHASE1, NA
study_typestringStudy type: INTERVENTIONAL, OBSERVATIONAL, EXPANDED_ACCESS
conditionsstring[]Conditions or diseases being studied
interventionsstring[]Intervention names: drugs, devices, procedures
intervention_typesstring[]Intervention types: DRUG, DEVICE, BIOLOGICAL, PROCEDURE, DIAGNOSTIC_TEST
lead_sponsorstringLead sponsor organization name
lead_sponsor_typestringSponsor class: INDUSTRY, NIH, OTHER, NETWORK
collaboratorsstring[]Collaborating organizations
enrollment_countnumberParticipant count — enrolled or estimated
enrollment_typestringWhether the enrollment count is ACTUAL or ESTIMATED
start_datestringStudy start date
primary_completion_datestringDate of last participant's last visit for the primary outcome
completion_datestringFull study completion date
primary_outcomestringPrimary outcome measure with time frame
secondary_outcomesstring[]Secondary outcome measures with time frames
eligibility_criteriastringFull inclusion and exclusion criteria text
min_agestringMinimum eligible age
max_agestringMaximum eligible age (empty string if no upper limit)
sexstringEligible sex: ALL, MALE, FEMALE
locationsobject[]Study sites — facility, city, state, country, zip, and contact details
has_resultsbooleanWhether results have been posted to ClinicalTrials.gov
results_first_postedstringDate results were first posted
last_update_postedstringDate of the most recent record update
study_urlstringDirect URL to the study page on ClinicalTrials.gov

FAQ

How many clinical trials does ClinicalTrials.gov Crawler cover? ClinicalTrials.gov Crawler queries the full ClinicalTrials.gov database — over 500,000 registered studies from all countries. If a study was registered there, the crawler can reach it.

Do I need proxies or an API key to run this? No. ClinicalTrials.gov is a public U.S. government service maintained by the National Library of Medicine. The API requires no authentication and no proxy. The crawler ships with proxies disabled by default.

Can I run a bulk export without filters? Not with maxItems set to 0. An unlimited run with no filters would queue the entire 500K+ record database, which is rarely what anyone actually needs. Provide at least one filter — condition, sponsor, phase, status, or study type — when running unlimited. With filters, unlimited runs are fine.

How current is the data? ClinicalTrials.gov Crawler reads from the live API. Sponsors are required to update their registrations regularly, and the last_update_posted field on each record shows when that specific study was last modified. The crawler does not cache anything.

What is the difference between studyStatus and studyType? ClinicalTrials.gov Crawler treats them as separate axes. Status describes where a study is in its lifecycle — RECRUITING, COMPLETED, SUSPENDED, etc. Type describes the study design — INTERVENTIONAL (a drug or device being tested), OBSERVATIONAL (no assigned intervention), or EXPANDED_ACCESS. Both filters can be applied at the same time.

Need More Features?

Need additional fields, a different data source, or a scheduled run? Get in touch.

Why Use ClinicalTrials.gov Crawler?

  • Official API, not HTML scraping — the crawler reads the ClinicalTrials.gov v2 JSON endpoints directly, so field names and data structure match what the NLM publishes, not what a selector happened to grab last Tuesday
  • 25+ fields per study, including contact data — each location record carries facility name, address, and primary contact information, which matters when the goal is outreach rather than just counting trials
  • No proxy cost, no authentication overhead — government data, open access; the crawler's per-record cost reflects actual compute, not unnecessary infrastructure