PubMed Scraper
Pricing
from $0.80 / 1,000 results
PubMed Scraper
Search 35M+ medical citations from PubMed/MEDLINE. Extract articles, abstracts, authors, MeSH terms, and citations for research, competitive intelligence, or AI/RAG pipelines. No API key required.
Pricing
from $0.80 / 1,000 results
Rating
0.0
(0)
Developer
mick_
Actor stats
0
Bookmarked
5
Total users
4
Monthly active users
9 days ago
Last modified
Categories
Share
Search 35 million biomedical articles from PubMed/MEDLINE. Batch queries, MeSH filtering, full abstracts, author affiliations, and structured metadata — ready for AI pipelines, systematic reviews, and drug research.
Extract structured article data from PubMed via the NCBI E-utilities API. No API key required. Clean JSON output with consistent fields — ready for downstream analysis, RAG pipelines, or LLM tool use.
🚀 What's New (v1.1.0)
Batch Query Search
Run multiple search queries in a single job. Pass queriesList: ["cancer immunotherapy", "CAR-T therapy", "checkpoint inhibitors"] and get merged, deduplicated results across all queries. One paid run replaces 10+ separate searches — built for systematic reviews and broad topic mapping.
Emoji Input Labels
All input fields now have emoji labels for faster navigation in the Apify console.
🎯 Use Cases
🤖 AI Agents & LLM Pipelines
PubMed is the gold standard for biomedical knowledge. Wire this actor into your AI pipeline as a callable tool:
- RAG knowledge bases: Index 10,000+ abstracts for semantic search over medical literature
- MCP tool use: AI agents call this actor in real time to look up evidence, find citations, or answer clinical questions
- Literature-grounded generation: Pull structured abstracts as context before generating summaries, reports, or clinical content
- Evidence retrieval: Agent receives a medical question → queries PubMed → returns cited sources
One run can return thousands of structured, citable records ready for embedding. The consistent schema (pmid, abstract, mesh_terms, doi_url) means no cleaning required.
Example — AI agent topic scan:
{"mode": "search_articles","queriesList": ["GLP-1 receptor agonist", "semaglutide weight loss", "tirzepatide clinical trial"],"publicationType": "Clinical Trial","dateFrom": "2022","maxResults": 500}
💊 Pharma & Biotech Research
Track the scientific landscape around drug targets, mechanisms, and therapeutic areas — without manual PubMed sessions:
- Competitive intelligence: Monitor competitor publications by author, institution, or research area
- Target validation: Pull all publications on a gene or protein target to assess research depth
- Clinical trial landscape: Filter by Clinical Trial or RCT publication type to see what's been run
- Drug repurposing: Cross-reference MeSH terms across therapeutic areas to find unexplored connections
- Pipeline monitoring: Track publications on specific drugs or mechanisms as they emerge
Example — drug target landscape:
{"mode": "search_articles","meshTerm": "PCSK9","publicationType": "Review","sort": "pub_date","maxResults": 200}
📑 Systematic Reviews & Meta-Analyses
Systematic reviews require comprehensive literature searches across multiple query formulations. This is where batch search pays off:
- Multi-query search: Cover all synonyms and related terms in one job — no manual re-running
- Deduplication: Results automatically deduplicated by PMID — no overlap in your dataset
- Date-bounded searches: Filter to a specific review window with
dateFrom/dateTo - Study design filtering: Filter to RCTs, meta-analyses, or systematic reviews only
- Export ready: Download as CSV or Excel directly from the Apify dataset UI
Example — systematic review search:
{"mode": "search_articles","queriesList": ["cognitive behavioral therapy depression","CBT major depressive disorder","psychotherapy depressive symptoms randomized"],"publicationType": "Randomized Controlled Trial","dateFrom": "2015","dateTo": "2024","maxResults": 1000}
📖 Usage Examples
Example 1: Batch Query Search (Systematic Review)
{"mode": "search_articles","queriesList": ["cancer immunotherapy", "CAR-T cell therapy", "PD-1 inhibitor"],"publicationType": "Clinical Trial","maxResults": 500}
Example 2: Single Keyword Search
{"mode": "search_articles","query": "CRISPR gene editing","publicationType": "Review","maxResults": 100}
Example 3: MeSH Term Search
{"mode": "search_articles","meshTerm": "Alzheimer Disease","publicationType": "Meta-Analysis","dateFrom": "2020","maxResults": 200}
Example 4: Author Search
{"mode": "search_by_author","author": "Zhang F","query": "CRISPR","maxResults": 50}
Example 5: Journal Search
{"mode": "search_by_journal","journal": "Nature Medicine","query": "immunotherapy","sort": "pub_date","maxResults": 100}
Example 6: Single Article Lookup
{"mode": "get_article","pmid": "33243215"}
Example 7: Date-Bounded Search
{"mode": "search_articles","query": "COVID-19 long COVID","dateFrom": "2023/01/01","dateTo": "2024/12/31","sort": "pub_date","maxResults": 300}
👥 Who Uses This
🔬 Systematic Reviewers & Meta-Analysts
Running a systematic review means covering all synonyms and related terms — one query isn't enough. Batch search handles this in a single run.
- Cover all query variants (
queriesList) and get a deduplicated PMID set across all of them - Bound by date range to match your review window
- Filter to the exact study design you need (RCT, Meta-Analysis, Systematic Review)
- Download as CSV/Excel directly from the Apify dataset UI — no cleaning needed
{"mode": "search_articles","queriesList": ["cognitive behavioral therapy depression","CBT major depressive disorder","psychotherapy depressive symptoms"],"publicationType": "Randomized Controlled Trial","dateFrom": "2015","dateTo": "2024","maxResults": 1000}
💊 Pharma & Biotech Teams
Track the publication landscape around a drug target, mechanism, or therapeutic area without manual PubMed sessions.
- Map all publications on a gene or protein target to assess research depth before investing in it
- Monitor competitor publications by author, institution, or research area over time
- Filter to Review or Clinical Trial types to see what stage the evidence is at
- Track emerging publications with
sort: pub_date+ date range
{"mode": "search_articles","meshTerm": "PCSK9","publicationType": "Clinical Trial","sort": "pub_date","dateFrom": "2022","maxResults": 500}
🧪 Clinical Researchers
Find evidence for a specific intervention, track a researcher's publication record, or monitor a journal's output.
- Pull all papers by a specific author to build their publication profile
- Search a journal combined with a keyword to scope relevant output
- Use MeSH terms for precise clinical concept matching (avoids terminology variation)
- Combine author + query to track a researcher's work on a specific topic
{"mode": "search_by_author","author": "Bhatt DL","query": "cardiovascular outcomes trial","maxResults": 100}
🤖 AI/LLM Engineers
PubMed is the gold-standard source for biomedical knowledge. Wire this into your pipeline as a callable data source.
- Index thousands of abstracts into a vector store for semantic retrieval over medical literature
- Use via MCP so AI agents can look up live evidence during a conversation
- Pull literature as structured context before generating summaries, clinical notes, or reports
- The consistent output schema (
pmid,abstract,mesh_terms,doi_url) means no field normalization needed downstream
{"mode": "search_articles","queriesList": ["GLP-1 receptor agonist", "semaglutide weight loss", "tirzepatide efficacy"],"publicationType": "Clinical Trial","dateFrom": "2020","maxResults": 500}
📝 Medical Writers & Journalists
Find citations fast, scope coverage on a topic, or pull an author's full bibliography without spending hours on PubMed.
- Single article lookup by PMID to get structured metadata for citation formatting
- Author search to build a bibliography or verify publication history
- Topic search with date range to understand how coverage of a story has evolved
pmc_urlfield points directly to free full text when available
{"mode": "get_article","pmid": "36352213"}
🔗 Related Actors
Other actors from the same portfolio that pair well with PubMed Scraper:
| Actor | What It Does |
|---|---|
| Clinical Trial Site Contact Finder | Extracts principal investigator contacts from ClinicalTrials.gov — useful alongside PubMed author searches |
| NPI Provider Contact Finder | Finds doctor emails, practice websites, and LinkedIn profiles from the NPPES NPI Registry |
| FDA Drug Labels Scraper | Pulls structured drug label data from FDA DailyMed — pair with PubMed for full drug research coverage |
| Academic Paper Scraper | Research papers from ArXiv, Semantic Scholar, and CrossRef — broader coverage beyond biomedical |
🔍 Input Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
mode | string | search_articles | search_articles, get_article, search_by_author, search_by_journal |
query | string | "" | Single keyword search. Ignored when queriesList is provided |
queriesList | array | [] | NEW — Batch search multiple queries, results merged and deduplicated |
pmid | string | "" | PubMed ID for get_article mode |
author | string | "" | Author name (e.g. "Smith J") |
journal | string | "" | Journal name (e.g. "Nature", "Cell") |
meshTerm | string | "" | MeSH subject heading (e.g. "Neoplasms") |
publicationType | string | "" | Filter by study design: Clinical Trial, Review, Meta-Analysis, Randomized Controlled Trial, Systematic Review |
dateFrom | string | "" | Publication start date: YYYY, YYYY/MM, or YYYY/MM/DD |
dateTo | string | "" | Publication end date: YYYY, YYYY/MM, or YYYY/MM/DD |
sort | string | relevance | relevance, pub_date, Author, JournalName |
maxResults | integer | 100 | Max articles to return (1–10,000). Free tier: 10 per run |
📊 Output Format
Each article is pushed as a structured JSON record. Download as JSON, CSV, or Excel from the Apify dataset UI.
{"schema_version": "1.0","type": "article","pmid": "33243215","doi": "10.1038/s41586-020-2649-2","pmc_id": "PMC7901054","title": "CRISPR-Cas9 structures and mechanisms","abstract": "Genome editing with CRISPR-Cas9 has transformed biological research...","authors": [{"last_name": "Doudna","fore_name": "Jennifer A","initials": "JA","affiliation": "University of California, Berkeley"}],"journal": "Annual review of biochemistry","publication_date": "2020 Jun","pub_year": "2020","mesh_terms": ["CRISPR-Cas Systems", "Gene Editing", "RNA, Guide, CRISPR-Cas Systems"],"publication_types": ["Journal Article", "Review"],"keywords": ["CRISPR", "Cas9", "genome editing"],"grant_list": ["R01 GM081879 (NIH)"],"reference_count": 142,"pubmed_url": "https://pubmed.ncbi.nlm.nih.gov/33243215/","doi_url": "https://doi.org/10.1038/s41586-020-2649-2","pmc_url": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7901054/"}
Output Fields
| Field | Description |
|---|---|
pmid | PubMed ID |
doi / pmc_id | DOI and PubMed Central ID |
title | Article title |
abstract | Full abstract text |
authors | List with last name, first name, initials, affiliation |
journal / journal_abbreviation | Full and abbreviated journal name |
volume / issue / pages | Publication details |
publication_date / pub_year | Date and year |
keywords | Author-provided keywords |
mesh_terms | MeSH subject headings |
publication_types | Article types (Review, Clinical Trial, etc.) |
grant_list | Funding sources |
reference_count | Number of references |
pubmed_url / doi_url / pmc_url | Direct links to PubMed, DOI, and free full text |
🤖 MCP Integration
Wire this actor as a tool in your AI agent pipeline via the Apify MCP server. No custom server needed.
- Endpoint:
https://mcp.apify.com?tools=labrat011/pubmed-scraper - Auth:
Authorization: Bearer <APIFY_TOKEN> - Works with: Claude Desktop, Cursor, VS Code, Windsurf, Warp, Gemini CLI
MCP config:
{"mcpServers": {"pubmed-scraper": {"url": "https://mcp.apify.com?tools=labrat011/pubmed-scraper","headers": {"Authorization": "Bearer <APIFY_TOKEN>"}}}}
Ask your AI: "Find all clinical trials on semaglutide published in the last 2 years and summarize the findings"
💰 Pricing
Pay-per-result pricing — you only pay for what you scrape.
| Tier | Limit | Use case |
|---|---|---|
| Free | 10 results/run | Testing and integration verification |
| Paid | Up to 10,000 results/run | Systematic reviews, RAG pipelines, bulk research |
❓ FAQ
Can I get full text?
This actor returns abstracts and metadata. Check the pmc_url field — if present, the full text is freely available on PubMed Central. Otherwise use doi_url to reach the publisher.
What are MeSH terms?
Medical Subject Headings (MeSH) are controlled vocabulary terms assigned by NLM indexers. They help find articles on specific topics even when authors use different terminology. Use the MeSH Browser to find the right term.
How current is the data?
PubMed is updated daily. This actor queries NCBI directly, so results reflect the current state of the database.
Is there a rate limit?
NCBI allows 3 requests/second without an API key. Rate limiting is built in — you don't need to configure anything.
What's the difference between query and queriesList?
query runs a single search. queriesList runs multiple searches in one job and merges the results, deduplicating by PMID. Use queriesList for systematic reviews where you need to cover multiple synonyms or related terms.
📚 Resources
- PubMed — Web interface
- MeSH Browser — Search MeSH terms
- NCBI E-utilities — API documentation
- PubMed Help — Search syntax guide
Built for researchers, AI engineers, and data scientists who need biomedical literature at scale.