NIH RePORTER Scraper - Grants, PIs & Linked Publications
Pricing
Pay per event
NIH RePORTER Scraper - Grants, PIs & Linked Publications
Scrape NIH-funded research projects from the official RePORTER v2 API. Extract PI names, award amounts, activity codes (R01, R21, K99), study sections, dates, and active/terminated status. Optionally pull linked publications (PMIDs). Filter by keyword, fiscal year, PI, org, state, or institute.
Pricing
Pay per event
Rating
0.0
(0)
Developer
BowTiedRaccoon
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
7 days ago
Last modified
Categories
Share
Extract NIH-funded research project records from the official RePORTER v2 API — no account or proxy required. Retrieve PI names, award amounts, activity codes, study sections, dates, active/terminated status, and optionally linked PubMed publication IDs.
What you get
Each output record corresponds to one NIH project award (one fiscal-year slice). Fields include:
| Field | Description |
|---|---|
project_num | Full NIH project number (e.g. 5R01CA123456-05) |
core_project_num | Core project number — groups subprojects and multi-year awards |
appl_id | Application ID |
fiscal_year | NIH fiscal year |
project_title | Project title |
abstract_text | Full project abstract |
phr_text | Public health relevance statement |
activity_code | NIH activity code (R01, R21, K99, F31, P30, U54, …) |
agency_ic_admin | Administering institute/center (NCI, NIAID, NHLBI, …) |
award_amount | Total award amount (USD) |
direct_cost_amt | Direct costs (USD) |
indirect_cost_amt | Indirect costs (USD) |
contact_pi_name | Contact PI name |
principal_investigators | Full PI roster — each entry is a JSON string with full_name, profile_id, is_contact_pi |
organization_name | Funded institution |
org_state | US state of funded institution |
is_active | Whether the project is currently active |
arra_funded | Whether funded via ARRA (stimulus) |
budget_start / budget_end | Budget period dates |
project_start_date / project_end_date | Project period dates |
full_study_section | NIH study section that reviewed the application |
agency_ic_fundings | IC-level funding breakdown (FY:IC:amount strings) |
spending_categories | NIH spending categories |
linked_publication_pmids | PubMed IDs of linked publications (when Include Linked Publications is enabled) |
project_detail_url | Direct link to the RePORTER project-details page |
Filtering options
| Input | Effect |
|---|---|
| Keyword / Text Search | Search across title, abstract, and terms |
| Fiscal Years | Limit to one or more NIH fiscal years (strongly recommended for large pulls) |
| Activity Codes | E.g. R01, R21, K99, F31, P30, U54 |
| Administering Institute | E.g. NCI, NIAID, NHLBI, NIGMS |
| PI Names | Filter by PI last name |
| Organization Names | Filter by funded institution |
| Organization States | Filter by US state (e.g. CA, MA, NY) |
| Active Projects Only | Exclude terminated/closed awards |
| Newly Added Only | Only records recently added to RePORTER |
| Include Linked Publications | Fetch linked PubMed IDs for each project |
| Max Items | Cap on total records returned |
API limits & pagination
The NIH RePORTER v2 API enforces a hard cap of 15,000 rows per search query (offset + page size cannot exceed 15,000). For large pulls, specify one or more Fiscal Years — the scraper runs a separate query per year so each slice stays under the cap. A single fiscal year typically contains 60,000–100,000 awards; the scraper fetches up to 15,000 per year and logs a warning when the cap is reached.
Use cases
- Grant landscape analysis — map NIH funding across institutes, activity codes, and institutions
- PI profiling — identify investigators and their award history
- Policy research — track ARRA, COVID-response, or newly-terminated awards
- Publication pipeline — link grants to downstream PubMed output
- Competitive intelligence — benchmark funding in a specific disease area or geography
Example output
{"project_num": "5R01CA123456-05","core_project_num": "R01CA123456","appl_id": 10987654,"fiscal_year": 2024,"project_title": "Novel Approaches to Targeted Cancer Therapy","activity_code": "R01","agency_ic_admin": "NCI","award_amount": 512000,"direct_cost_amt": 350000,"indirect_cost_amt": 162000,"contact_pi_name": "DOE, JANE","principal_investigators": ["{\"full_name\":\"Jane Doe\",\"profile_id\":12345,\"is_contact_pi\":true,\"title\":\"Prof.\"}"],"organization_name": "STANFORD UNIVERSITY","org_state": "CA","is_active": true,"arra_funded": false,"budget_start": "2024-04-01","budget_end": "2025-03-31","project_start_date": "2020-04-01","project_end_date": "2025-03-31","full_study_section": "Tumor Microenvironment Study Section","agency_ic_fundings": ["2024:NCI:512000"],"spending_categories": ["Cancer"],"linked_publication_pmids": [],"project_detail_url": "https://reporter.nih.gov/project-details/R01CA123456","status": "success"}
Data source
All data is drawn from the NIH Research Portfolio Online Reporting Tools (RePORTER) — a public database maintained by the National Institutes of Health. No authentication is required. The scraper calls the official v2 REST API and does not require a proxy.