PubMed Abstract Scraper avatar

PubMed Abstract Scraper

Pricing

$8.00 / 1,000 results

Go to Apify Store
PubMed Abstract Scraper

PubMed Abstract Scraper

Scrape PubMed abstracts by keyword with optional date filtering. Returns title, authors, DOI, abstract, journal, and publication date as structured JSON.

Pricing

$8.00 / 1,000 results

Rating

0.0

(0)

Developer

azureblue

azureblue

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

3 days ago

Last modified

Categories

Share

Extract structured PubMed abstracts by keyword — with optional date filtering, DOI links, and author lists.

Search PubMed's 35+ million biomedical citations and retrieve clean, structured JSON output: title, authors, DOI, abstract text, journal name, and publication date. No API key required.


What does this Actor do?

This Actor queries the NCBI PubMed database using the official E-utilities API. Given a search keyword (supports MeSH terms, gene names, author filters, and free-text queries), it retrieves abstracts and returns them as structured JSON records — one object per article.

Ideal for systematic literature reviews, research trend analysis, citation monitoring, and building medical knowledge bases.


Use Cases

1. Systematic Literature Review Automation

A researcher studying COVID-19 long-term effects needs 500 recent abstracts. Input: keyword: "long COVID symptoms", dateFrom: "2021-01-01", maxResults: 500. The Actor returns all matching abstracts in minutes instead of hours of manual PubMed browsing — ready to import into Zotero, Rayyan, or Excel.

2. Competitor & Drug Pipeline Monitoring

A pharma team tracks new publications about a competitor's drug every week. They run the Actor with keyword: "semaglutide cardiovascular" on a schedule and feed the output into a dashboard to detect new trial results, safety signals, or opinion pieces the moment they are published.

3. Medical Education Content Generation

A medical education platform wants to keep its question bank up to date. The Actor scrapes the latest guidelines (e.g. keyword: "ESC heart failure guidelines 2024") and the abstracts are fed to an LLM pipeline that generates new MCQ questions — ensuring currency without manual curation.


Input

FieldTypeRequiredDefaultDescription
keywordString✅ YesPubMed search query (MeSH, gene, free text, etc.)
maxResultsIntegerNo100Max abstracts to retrieve (1–10,000)
dateFromStringNoFilter: published on/after this date (YYYY-MM-DD)
dateToStringNotodayFilter: published on/before this date (YYYY-MM-DD)

Example Input

{
"keyword": "myocardial infarction reperfusion therapy",
"maxResults": 50,
"dateFrom": "2022-01-01",
"dateTo": "2024-12-31"
}

Output

Each result is saved to the Default Dataset as a JSON object:

{
"pmid": "38123456",
"title": "Outcomes of primary PCI vs thrombolysis in STEMI: a meta-analysis",
"authors": ["Müller A", "Schmidt B", "Jensen C"],
"doi": "10.1016/j.jacc.2023.11.042",
"abstract": "Background: Primary percutaneous coronary intervention (PCI) is the standard of care for ST-elevation myocardial infarction... Conclusions: Primary PCI significantly reduces 30-day mortality compared to thrombolysis (OR 0.63, 95% CI 0.54–0.74).",
"pubDate": "2024-01-15",
"journal": "Journal of the American College of Cardiology"
}

Pricing

This Actor uses Pay-Per-Result pricing at $0.008 per scraped abstract.

VolumeEstimated Cost
100 abstracts~$0.80
1,000 abstracts~$8.00
10,000 abstracts~$80.00

Technical Details

  • Data source: NCBI E-utilities API (official, public, no key required)
  • Rate limiting: 1 request/second (conservative; NCBI allows 3/sec without key)
  • Retry logic: Up to 3 automatic retries with exponential backoff
  • Batch size: 20 articles per API call (NCBI limit)
  • Output format: Structured JSON, one object per abstract
  • Node.js: v22 LTS

Supported Query Syntax

PubMed supports advanced search syntax. Examples:

  • BRCA1[gene] AND cancer — gene name filter
  • Smith J[author] AND cardiology — author filter
  • "heart failure"[MeSH] AND randomized controlled trial[pt] — MeSH + publication type
  • aspirin AND (myocardial infarction OR stroke) — boolean logic

Support

Issues or feature requests? Open a ticket via the Apify console.