NIH RePORTER Scraper - Grants, PIs & Linked Publications avatar

NIH RePORTER Scraper - Grants, PIs & Linked Publications

Pricing

Pay per event

Go to Apify Store
NIH RePORTER Scraper - Grants, PIs & Linked Publications

NIH RePORTER Scraper - Grants, PIs & Linked Publications

Scrape NIH-funded research projects from the official RePORTER v2 API. Extract PI names, award amounts, activity codes (R01, R21, K99), study sections, dates, and active/terminated status. Optionally pull linked publications (PMIDs). Filter by keyword, fiscal year, PI, org, state, or institute.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

BowTiedRaccoon

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

7 days ago

Last modified

Share

Extract NIH-funded research project records from the official RePORTER v2 API — no account or proxy required. Retrieve PI names, award amounts, activity codes, study sections, dates, active/terminated status, and optionally linked PubMed publication IDs.

What you get

Each output record corresponds to one NIH project award (one fiscal-year slice). Fields include:

FieldDescription
project_numFull NIH project number (e.g. 5R01CA123456-05)
core_project_numCore project number — groups subprojects and multi-year awards
appl_idApplication ID
fiscal_yearNIH fiscal year
project_titleProject title
abstract_textFull project abstract
phr_textPublic health relevance statement
activity_codeNIH activity code (R01, R21, K99, F31, P30, U54, …)
agency_ic_adminAdministering institute/center (NCI, NIAID, NHLBI, …)
award_amountTotal award amount (USD)
direct_cost_amtDirect costs (USD)
indirect_cost_amtIndirect costs (USD)
contact_pi_nameContact PI name
principal_investigatorsFull PI roster — each entry is a JSON string with full_name, profile_id, is_contact_pi
organization_nameFunded institution
org_stateUS state of funded institution
is_activeWhether the project is currently active
arra_fundedWhether funded via ARRA (stimulus)
budget_start / budget_endBudget period dates
project_start_date / project_end_dateProject period dates
full_study_sectionNIH study section that reviewed the application
agency_ic_fundingsIC-level funding breakdown (FY:IC:amount strings)
spending_categoriesNIH spending categories
linked_publication_pmidsPubMed IDs of linked publications (when Include Linked Publications is enabled)
project_detail_urlDirect link to the RePORTER project-details page

Filtering options

InputEffect
Keyword / Text SearchSearch across title, abstract, and terms
Fiscal YearsLimit to one or more NIH fiscal years (strongly recommended for large pulls)
Activity CodesE.g. R01, R21, K99, F31, P30, U54
Administering InstituteE.g. NCI, NIAID, NHLBI, NIGMS
PI NamesFilter by PI last name
Organization NamesFilter by funded institution
Organization StatesFilter by US state (e.g. CA, MA, NY)
Active Projects OnlyExclude terminated/closed awards
Newly Added OnlyOnly records recently added to RePORTER
Include Linked PublicationsFetch linked PubMed IDs for each project
Max ItemsCap on total records returned

API limits & pagination

The NIH RePORTER v2 API enforces a hard cap of 15,000 rows per search query (offset + page size cannot exceed 15,000). For large pulls, specify one or more Fiscal Years — the scraper runs a separate query per year so each slice stays under the cap. A single fiscal year typically contains 60,000–100,000 awards; the scraper fetches up to 15,000 per year and logs a warning when the cap is reached.

Use cases

  • Grant landscape analysis — map NIH funding across institutes, activity codes, and institutions
  • PI profiling — identify investigators and their award history
  • Policy research — track ARRA, COVID-response, or newly-terminated awards
  • Publication pipeline — link grants to downstream PubMed output
  • Competitive intelligence — benchmark funding in a specific disease area or geography

Example output

{
"project_num": "5R01CA123456-05",
"core_project_num": "R01CA123456",
"appl_id": 10987654,
"fiscal_year": 2024,
"project_title": "Novel Approaches to Targeted Cancer Therapy",
"activity_code": "R01",
"agency_ic_admin": "NCI",
"award_amount": 512000,
"direct_cost_amt": 350000,
"indirect_cost_amt": 162000,
"contact_pi_name": "DOE, JANE",
"principal_investigators": [
"{\"full_name\":\"Jane Doe\",\"profile_id\":12345,\"is_contact_pi\":true,\"title\":\"Prof.\"}"
],
"organization_name": "STANFORD UNIVERSITY",
"org_state": "CA",
"is_active": true,
"arra_funded": false,
"budget_start": "2024-04-01",
"budget_end": "2025-03-31",
"project_start_date": "2020-04-01",
"project_end_date": "2025-03-31",
"full_study_section": "Tumor Microenvironment Study Section",
"agency_ic_fundings": ["2024:NCI:512000"],
"spending_categories": ["Cancer"],
"linked_publication_pmids": [],
"project_detail_url": "https://reporter.nih.gov/project-details/R01CA123456",
"status": "success"
}

Data source

All data is drawn from the NIH Research Portfolio Online Reporting Tools (RePORTER) — a public database maintained by the National Institutes of Health. No authentication is required. The scraper calls the official v2 REST API and does not require a proxy.