ClinicalTrials.gov Study Crawler
Pricing
Pay per event
ClinicalTrials.gov Study Crawler
Crawl 500K+ clinical trial records from ClinicalTrials.gov via the v2 API. Extract study details, conditions, interventions, sponsors, phases, enrollment, eligibility, outcomes, and locations. Filter by condition, intervention, sponsor, phase, status, and study type.
Pricing
Pay per event
Rating
0.0
(0)
Developer
BowTiedRaccoon
Actor stats
0
Bookmarked
1
Total users
1
Monthly active users
9 days ago
Last modified
Categories
Share
ClinicalTrials.gov Clinical Study Data Crawler
Extract structured clinical trial records from ClinicalTrials.gov via the official v2 REST API. The database covers 500K+ studies — conditions, interventions, sponsors, phases, enrollment figures, eligibility criteria, outcomes, and study locations with contact details.
ClinicalTrials.gov Crawler Features
- Filters by condition or disease, intervention name, lead sponsor, trial phase, study status, study type, and general keyword — combine any of them
- Fetches 1,000 records per API call using cursor-based pagination, so large result sets do not require hundreds of round trips
- Covers all 8 study statuses: RECRUITING, COMPLETED, ACTIVE_NOT_RECRUITING, SUSPENDED, TERMINATED, and more
- Covers all trial phases from Early Phase 1 through Phase 4, plus Not Applicable
- Extracts 25+ fields per study including full eligibility criteria text and per-location contact information
- Queries the official v2 JSON API — no HTML parsing, no fragile selectors
- Requires no authentication and no proxy — ClinicalTrials.gov is a U.S. government service
- Rate-limited to ~7.7 requests per second, comfortably under the documented 10/sec ceiling
Who Uses ClinicalTrials.gov Data and Why?
- Pharma and biotech researchers — track competitor trials, map pipeline activity by indication, and monitor phase progression across therapeutic areas
- Clinical research organizations — identify actively recruiting trials by condition and geography to support site selection and patient referral
- Investment analysts — map development pipelines for biotech companies by pulling every active or completed study tied to a specific sponsor
- Patient advocates — find open recruiting studies for a given condition, filtered by geography and eligibility parameters
- Academic epidemiologists — analyze enrollment trends, study design patterns, and outcome measures across thousands of trials at once
How ClinicalTrials.gov Crawler Works
- You provide at least one filter: a condition name, an intervention, a sponsor, a phase, a status, a study type, or a free-text keyword. Combining multiple filters is supported.
- The crawler builds a query against the ClinicalTrials.gov v2 API and fetches the first page of up to 1,000 results.
- It follows the
nextPageTokencursor through subsequent pages until it reaches yourmaxItemslimit or exhausts the result set. - Each API response is transformed into a flat, structured record and saved to the Apify dataset.
Input
Basic: recruiting breast cancer trials in Phase 3
{"condition": "breast cancer","phase": "PHASE3","studyStatus": ["RECRUITING"],"maxItems": 500}
Sponsor pipeline lookup
{"sponsor": "Pfizer","studyType": "INTERVENTIONAL","maxItems": 200}
Intervention-specific search
{"intervention": "pembrolizumab","studyStatus": ["COMPLETED"],"maxItems": 100}
Input Parameters
| Field | Type | Default | Description |
|---|---|---|---|
| condition | string | "" | Condition or disease being studied (e.g. "breast cancer", "Alzheimer", "Type 2 Diabetes"). |
| intervention | string | "" | Intervention name (e.g. "pembrolizumab", "radiation therapy"). Matches drug names, devices, and procedures. |
| sponsor | string | "" | Lead sponsor name (e.g. "Pfizer", "National Cancer Institute"). Partial match supported. |
| phase | string | "" | Trial phase. Options: EARLY_PHASE1, PHASE1, PHASE2, PHASE3, PHASE4, NA. Leave empty for all phases. |
| studyStatus | string[] | [] | One or more statuses: RECRUITING, ACTIVE_NOT_RECRUITING, COMPLETED, ENROLLING_BY_INVITATION, NOT_YET_RECRUITING, SUSPENDED, TERMINATED, WITHDRAWN. |
| studyType | string | "" | Study type. Options: INTERVENTIONAL, OBSERVATIONAL, EXPANDED_ACCESS. Leave empty for all. |
| keyword | string | "" | General keyword search across all study fields. Use for broad or exploratory queries. |
| maxItems | integer | 200 | Maximum records to return. Set to 0 for unlimited — requires at least one filter when doing so. |
| proxyConfiguration | object | disabled | Proxy settings. Not required — ClinicalTrials.gov does not have anti-bot measures. |
ClinicalTrials.gov Crawler Output Fields
{"nct_id": "NCT02625935","study_title": "A Prospective Observational Study Evaluating Treatment Decision Impact of Prosigna","brief_summary": "This study evaluates whether the Prosigna assay changes treatment decisions for early-stage breast cancer patients...","study_status": "COMPLETED","phase": "PHASE3","study_type": "OBSERVATIONAL","conditions": ["Breast Cancer"],"interventions": ["Prosigna Breast Cancer Prognostic Gene Signature Assay"],"intervention_types": ["DIAGNOSTIC_TEST"],"lead_sponsor": "NanoString Technologies, Inc.","lead_sponsor_type": "INDUSTRY","collaborators": ["American Society of Clinical Oncology"],"enrollment_count": 201,"enrollment_type": "ACTUAL","start_date": "2015-10","primary_completion_date": "2017-06","completion_date": "2017-06","primary_outcome": "Change in treatment recommendation (12 months)","secondary_outcomes": ["Patient anxiety levels (6 months)","Physician confidence in treatment decision (12 months)"],"eligibility_criteria": "Inclusion Criteria:\n- Female\n- Diagnosed with early-stage, hormone receptor-positive breast cancer...\n\nExclusion Criteria:\n- Prior chemotherapy...","min_age": "18 Years","max_age": "","sex": "FEMALE","locations": [{"facility": "Memorial Sloan Kettering Cancer Center","city": "New York","state": "New York","country": "United States","zip": "10065","contact_name": "Dr. Jane Smith","contact_phone": "212-555-0100","contact_email": "smith@mskcc.org"}],"has_results": true,"results_first_posted": "2018-03-15","last_update_posted": "2023-01-10","study_url": "https://clinicaltrials.gov/study/NCT02625935"}
| Field | Type | Description |
|---|---|---|
| nct_id | string | ClinicalTrials.gov identifier (e.g. NCT00000001) |
| study_title | string | Official study title |
| brief_summary | string | Brief summary of the study purpose and design |
| study_status | string | Overall study status: RECRUITING, COMPLETED, ACTIVE_NOT_RECRUITING, etc. |
| phase | string | Trial phase: PHASE1, PHASE2, PHASE3, PHASE4, EARLY_PHASE1, NA |
| study_type | string | Study type: INTERVENTIONAL, OBSERVATIONAL, EXPANDED_ACCESS |
| conditions | string[] | Conditions or diseases being studied |
| interventions | string[] | Intervention names: drugs, devices, procedures |
| intervention_types | string[] | Intervention types: DRUG, DEVICE, BIOLOGICAL, PROCEDURE, DIAGNOSTIC_TEST |
| lead_sponsor | string | Lead sponsor organization name |
| lead_sponsor_type | string | Sponsor class: INDUSTRY, NIH, OTHER, NETWORK |
| collaborators | string[] | Collaborating organizations |
| enrollment_count | number | Participant count — enrolled or estimated |
| enrollment_type | string | Whether the enrollment count is ACTUAL or ESTIMATED |
| start_date | string | Study start date |
| primary_completion_date | string | Date of last participant's last visit for the primary outcome |
| completion_date | string | Full study completion date |
| primary_outcome | string | Primary outcome measure with time frame |
| secondary_outcomes | string[] | Secondary outcome measures with time frames |
| eligibility_criteria | string | Full inclusion and exclusion criteria text |
| min_age | string | Minimum eligible age |
| max_age | string | Maximum eligible age (empty string if no upper limit) |
| sex | string | Eligible sex: ALL, MALE, FEMALE |
| locations | object[] | Study sites — facility, city, state, country, zip, and contact details |
| has_results | boolean | Whether results have been posted to ClinicalTrials.gov |
| results_first_posted | string | Date results were first posted |
| last_update_posted | string | Date of the most recent record update |
| study_url | string | Direct URL to the study page on ClinicalTrials.gov |
FAQ
How many clinical trials does ClinicalTrials.gov Crawler cover? ClinicalTrials.gov Crawler queries the full ClinicalTrials.gov database — over 500,000 registered studies from all countries. If a study was registered there, the crawler can reach it.
Do I need proxies or an API key to run this? No. ClinicalTrials.gov is a public U.S. government service maintained by the National Library of Medicine. The API requires no authentication and no proxy. The crawler ships with proxies disabled by default.
Can I run a bulk export without filters?
Not with maxItems set to 0. An unlimited run with no filters would queue the entire 500K+ record database, which is rarely what anyone actually needs. Provide at least one filter — condition, sponsor, phase, status, or study type — when running unlimited. With filters, unlimited runs are fine.
How current is the data?
ClinicalTrials.gov Crawler reads from the live API. Sponsors are required to update their registrations regularly, and the last_update_posted field on each record shows when that specific study was last modified. The crawler does not cache anything.
What is the difference between studyStatus and studyType?
ClinicalTrials.gov Crawler treats them as separate axes. Status describes where a study is in its lifecycle — RECRUITING, COMPLETED, SUSPENDED, etc. Type describes the study design — INTERVENTIONAL (a drug or device being tested), OBSERVATIONAL (no assigned intervention), or EXPANDED_ACCESS. Both filters can be applied at the same time.
Need More Features?
Need additional fields, a different data source, or a scheduled run? Get in touch.
Why Use ClinicalTrials.gov Crawler?
- Official API, not HTML scraping — the crawler reads the ClinicalTrials.gov v2 JSON endpoints directly, so field names and data structure match what the NLM publishes, not what a selector happened to grab last Tuesday
- 25+ fields per study, including contact data — each location record carries facility name, address, and primary contact information, which matters when the goal is outreach rather than just counting trials
- No proxy cost, no authentication overhead — government data, open access; the crawler's per-record cost reflects actual compute, not unnecessary infrastructure