PubMed NCBI Scraper - Biomedical Articles & MeSH (No Key)
Pricing
$2.00 / 1,000 records
PubMed NCBI Scraper - Biomedical Articles & MeSH (No Key)
Scrape 36M+ PubMed/NCBI biomedical articles: title, abstract, authors, journal, PMID, DOI, MeSH terms. No API key needed. Build literature reviews & AI training corpora. Works in Claude, ChatGPT & any MCP agent.
Pricing
$2.00 / 1,000 records
Rating
0.0
(0)
Developer
The Mine Works
Maintained by CommunityActor stats
0
Bookmarked
1
Total users
1
Monthly active users
10 hours ago
Last modified
Categories
Share
PubMed / NCBI — Biomedical Literature Search
Search the world's largest biomedical literature database from Apify. Access 36 million+ peer-reviewed articles from PubMed and MEDLINE — titles, abstracts, authors, journals, PMIDs, DOIs, and MeSH controlled vocabulary terms — without any API key. An optional free NCBI API key unlocks higher throughput.
Why This Actor?
PubMed is the definitive source of record for biomedical and life sciences research, maintained by the National Library of Medicine (NLM) at NIH. With 36 million+ articles spanning decades of research across medicine, pharmacology, genomics, neuroscience, oncology, and every other life science domain, it is the first stop for:
- Pharma and biotech researchers conducting competitive intelligence, target identification, or systematic literature reviews
- AI and machine learning teams building training corpora, biomedical NLP models, or question-answering systems that require structured scientific text
- Systematic review authors collecting all studies matching a clinical PICO question for meta-analysis
- Biotech investors and analysts tracking publication volume, author networks, and research momentum in a therapeutic area
- Academic departments monitoring publications from collaborating institutions or specific research groups
This actor wraps the official NCBI E-utilities API (esearch + efetch endpoints at eutils.ncbi.nlm.nih.gov) and delivers clean structured JSON — one article per dataset row — with full MeSH term arrays for downstream semantic analysis.
PubMed Query Syntax
The actor supports PubMed's full advanced query language. Use field tags in brackets to target specific fields:
| Tag | Field | Example |
|---|---|---|
[ti] | Article title | CRISPR[ti] |
[au] | Author name | Smith J[au] |
[ta] | Journal abbreviation | Nature[ta] |
[mh] | MeSH heading | Neoplasms[mh] |
[dp] | Publication date | 2023:2024[dp] |
[pt] | Publication type | Review[pt] |
Combine with boolean operators: GLP-1 receptor agonist[ti] AND diabetes[mh] AND 2022:2024[dp]
Inputs
| Field | Type | Description | Default |
|---|---|---|---|
query | string | PubMed search query with optional field tags | GLP-1 receptor agonist diabetes |
dateFrom | string | Published from date (YYYY/MM/DD) | 2020/01/01 |
dateTo | string | Published to date (YYYY/MM/DD) | — |
ncbiApiKey | string | Optional free NCBI key (lifts rate limit 3x to 10 req/sec) | — |
maxResults | integer | Maximum articles to return (1–10,000) | 100 |
Output Format
Each article is stored as one item in the Apify dataset:
{"pmid": "38234567","title": "Efficacy of GLP-1 receptor agonists in type 2 diabetes: a systematic review","abstract": "Background: GLP-1 receptor agonists have emerged as...","authors": ["Smith Jane A", "Chen Robert B", "Patel Anita K"],"journal": "The Lancet Diabetes & Endocrinology","issn": "2213-8587","year": "2024","doi": "10.1016/S2213-8587(24)00123-4","mesh_terms": ["Glucagon-Like Peptide-1 Receptor", "Diabetes Mellitus, Type 2", "Hypoglycemic Agents"],"url": "https://pubmed.ncbi.nlm.nih.gov/38234567/","scraped_at": "2024-11-15T09:22:11.000Z"}
A summary record is appended at the end with total article count and run timestamp.
MeSH Terms
Medical Subject Headings (MeSH) are NLM's controlled vocabulary for indexing biomedical literature. Every PubMed article is manually tagged with MeSH descriptors by NLM indexers. The mesh_terms array in each output record contains these structured tags, which are ideal for:
- Semantic clustering of articles by disease area
- Building ontology-aligned training data for biomedical NLP models
- Identifying related concepts the author did not use in the title or abstract
Pricing
First 25 results are free on every Apify account — no charge until you exceed the free tier.
After the free tier: $4 per 1,000 articles (Pay-Per-Event billing). A 1,000-article run costs $4.00. A 10,000-article run costs $40.00. You are charged only for articles actually delivered.
Frequently Asked Questions
Q: Do I need an NCBI API key?
No. The E-utilities API is freely accessible without authentication. However, without a key you are limited to 3 requests per second. A free NCBI API key (available at ncbi.nlm.nih.gov/account/settings/) increases this to 10 requests per second, making large runs significantly faster. Enter your key in the ncbiApiKey input field.
Q: How current is the data? PubMed is updated daily with new articles, corrections, and MeSH annotations. Articles typically appear in PubMed within days to weeks of publication, depending on the journal's submission practices.
Q: Can I retrieve full text? The actor retrieves titles, abstracts, and metadata. Full text is not available via the E-utilities API for most articles — full text access depends on publisher agreements. For open-access articles, the DOI in the output can be used to retrieve full text via PubMed Central (PMC) or the publisher.
Q: What is the maximum number of articles I can retrieve?
The actor supports up to 10,000 articles per run. For larger literature sets, use date range filters (dateFrom/dateTo) or narrower query terms to partition your retrieval across multiple runs.
Q: Can I search by specific author or institution?
Yes. Use the [au] tag for author names (e.g. Smith JA[au]) and the [ad] affiliation tag for institutions (e.g. Harvard[ad]). Multiple authors or institutions can be combined with OR: (Smith JA[au] OR Chen RB[au]).
Q: How does the actor handle rate limits? The actor automatically respects NCBI's rate limits by inserting delays between requests — 340ms without an API key (staying safely under 3 req/sec) and 110ms with an API key. Automatic retry with exponential backoff handles transient 429 and 5xx errors.
Q: Are preprints included in PubMed results? PubMed indexes peer-reviewed articles from MEDLINE-indexed journals. Preprints on bioRxiv/medRxiv are generally not indexed in PubMed. Use the arXiv or bioRxiv scrapers for preprint coverage.
Use in Claude, ChatGPT & any MCP agent
This actor is also a Model Context Protocol (MCP) server tool — call it directly from Claude, ChatGPT, Cursor, Windsurf, or any MCP-compatible AI agent. The agent only pays for results delivered (same pay-per-result model).
- Per-actor MCP endpoint:
https://mcp.apify.com/?tools=themineworks/pubmed-ncbi-scraper - Full Mine Works MCP server (all tools):
https://the-mine-works-mcp.hatchable.site/api/mcp
// Call this actor as a tool via apify-client (Node)import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_APIFY_TOKEN' });const run = await client.actor('themineworks/pubmed-ncbi-scraper').call({ /* input from the table above */ });const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items);