PubMed NCBI Scraper - Biomedical Articles & MeSH (No Key) avatar

PubMed NCBI Scraper - Biomedical Articles & MeSH (No Key)

Pricing

$2.00 / 1,000 records

Go to Apify Store
PubMed NCBI Scraper - Biomedical Articles & MeSH (No Key)

PubMed NCBI Scraper - Biomedical Articles & MeSH (No Key)

Scrape 36M+ PubMed/NCBI biomedical articles: title, abstract, authors, journal, PMID, DOI, MeSH terms. No API key needed. Build literature reviews & AI training corpora. Works in Claude, ChatGPT & any MCP agent.

Pricing

$2.00 / 1,000 records

Rating

0.0

(0)

Developer

The Mine Works

The Mine Works

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

1

Monthly active users

10 hours ago

Last modified

Share

PubMed / NCBI — Biomedical Literature Search

Search the world's largest biomedical literature database from Apify. Access 36 million+ peer-reviewed articles from PubMed and MEDLINE — titles, abstracts, authors, journals, PMIDs, DOIs, and MeSH controlled vocabulary terms — without any API key. An optional free NCBI API key unlocks higher throughput.

Why This Actor?

PubMed is the definitive source of record for biomedical and life sciences research, maintained by the National Library of Medicine (NLM) at NIH. With 36 million+ articles spanning decades of research across medicine, pharmacology, genomics, neuroscience, oncology, and every other life science domain, it is the first stop for:

  • Pharma and biotech researchers conducting competitive intelligence, target identification, or systematic literature reviews
  • AI and machine learning teams building training corpora, biomedical NLP models, or question-answering systems that require structured scientific text
  • Systematic review authors collecting all studies matching a clinical PICO question for meta-analysis
  • Biotech investors and analysts tracking publication volume, author networks, and research momentum in a therapeutic area
  • Academic departments monitoring publications from collaborating institutions or specific research groups

This actor wraps the official NCBI E-utilities API (esearch + efetch endpoints at eutils.ncbi.nlm.nih.gov) and delivers clean structured JSON — one article per dataset row — with full MeSH term arrays for downstream semantic analysis.

PubMed Query Syntax

The actor supports PubMed's full advanced query language. Use field tags in brackets to target specific fields:

TagFieldExample
[ti]Article titleCRISPR[ti]
[au]Author nameSmith J[au]
[ta]Journal abbreviationNature[ta]
[mh]MeSH headingNeoplasms[mh]
[dp]Publication date2023:2024[dp]
[pt]Publication typeReview[pt]

Combine with boolean operators: GLP-1 receptor agonist[ti] AND diabetes[mh] AND 2022:2024[dp]

Inputs

FieldTypeDescriptionDefault
querystringPubMed search query with optional field tagsGLP-1 receptor agonist diabetes
dateFromstringPublished from date (YYYY/MM/DD)2020/01/01
dateTostringPublished to date (YYYY/MM/DD)
ncbiApiKeystringOptional free NCBI key (lifts rate limit 3x to 10 req/sec)
maxResultsintegerMaximum articles to return (1–10,000)100

Output Format

Each article is stored as one item in the Apify dataset:

{
"pmid": "38234567",
"title": "Efficacy of GLP-1 receptor agonists in type 2 diabetes: a systematic review",
"abstract": "Background: GLP-1 receptor agonists have emerged as...",
"authors": ["Smith Jane A", "Chen Robert B", "Patel Anita K"],
"journal": "The Lancet Diabetes & Endocrinology",
"issn": "2213-8587",
"year": "2024",
"doi": "10.1016/S2213-8587(24)00123-4",
"mesh_terms": ["Glucagon-Like Peptide-1 Receptor", "Diabetes Mellitus, Type 2", "Hypoglycemic Agents"],
"url": "https://pubmed.ncbi.nlm.nih.gov/38234567/",
"scraped_at": "2024-11-15T09:22:11.000Z"
}

A summary record is appended at the end with total article count and run timestamp.

MeSH Terms

Medical Subject Headings (MeSH) are NLM's controlled vocabulary for indexing biomedical literature. Every PubMed article is manually tagged with MeSH descriptors by NLM indexers. The mesh_terms array in each output record contains these structured tags, which are ideal for:

  • Semantic clustering of articles by disease area
  • Building ontology-aligned training data for biomedical NLP models
  • Identifying related concepts the author did not use in the title or abstract

Pricing

First 25 results are free on every Apify account — no charge until you exceed the free tier.

After the free tier: $4 per 1,000 articles (Pay-Per-Event billing). A 1,000-article run costs $4.00. A 10,000-article run costs $40.00. You are charged only for articles actually delivered.

Frequently Asked Questions

Q: Do I need an NCBI API key? No. The E-utilities API is freely accessible without authentication. However, without a key you are limited to 3 requests per second. A free NCBI API key (available at ncbi.nlm.nih.gov/account/settings/) increases this to 10 requests per second, making large runs significantly faster. Enter your key in the ncbiApiKey input field.

Q: How current is the data? PubMed is updated daily with new articles, corrections, and MeSH annotations. Articles typically appear in PubMed within days to weeks of publication, depending on the journal's submission practices.

Q: Can I retrieve full text? The actor retrieves titles, abstracts, and metadata. Full text is not available via the E-utilities API for most articles — full text access depends on publisher agreements. For open-access articles, the DOI in the output can be used to retrieve full text via PubMed Central (PMC) or the publisher.

Q: What is the maximum number of articles I can retrieve? The actor supports up to 10,000 articles per run. For larger literature sets, use date range filters (dateFrom/dateTo) or narrower query terms to partition your retrieval across multiple runs.

Q: Can I search by specific author or institution? Yes. Use the [au] tag for author names (e.g. Smith JA[au]) and the [ad] affiliation tag for institutions (e.g. Harvard[ad]). Multiple authors or institutions can be combined with OR: (Smith JA[au] OR Chen RB[au]).

Q: How does the actor handle rate limits? The actor automatically respects NCBI's rate limits by inserting delays between requests — 340ms without an API key (staying safely under 3 req/sec) and 110ms with an API key. Automatic retry with exponential backoff handles transient 429 and 5xx errors.

Q: Are preprints included in PubMed results? PubMed indexes peer-reviewed articles from MEDLINE-indexed journals. Preprints on bioRxiv/medRxiv are generally not indexed in PubMed. Use the arXiv or bioRxiv scrapers for preprint coverage.

Use in Claude, ChatGPT & any MCP agent

This actor is also a Model Context Protocol (MCP) server tool — call it directly from Claude, ChatGPT, Cursor, Windsurf, or any MCP-compatible AI agent. The agent only pays for results delivered (same pay-per-result model).

  • Per-actor MCP endpoint: https://mcp.apify.com/?tools=themineworks/pubmed-ncbi-scraper
  • Full Mine Works MCP server (all tools): https://the-mine-works-mcp.hatchable.site/api/mcp
// Call this actor as a tool via apify-client (Node)
import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });
const run = await client.actor('themineworks/pubmed-ncbi-scraper').call({ /* input from the table above */ });
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items);