PubMed Scraper
Pricing
from $0.80 / 1,000 results
PubMed Scraper
Search 35M+ medical citations from PubMed/MEDLINE. Extract articles, abstracts, authors, MeSH terms, and citations for research, competitive intelligence, or AI/RAG pipelines. No API key required.
Pricing
from $0.80 / 1,000 results
Rating
0.0
(0)
Developer

Mick
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Search and extract medical literature from PubMed/MEDLINE -- articles, abstracts, authors, MeSH terms, and citations. No API key required.
Search the world's largest biomedical literature database with over 35 million citations. Extract structured article data for research, competitive intelligence, or AI/RAG applications.
What does it do?
PubMed Scraper queries the NCBI E-utilities API (ESearch + EFetch) and returns structured data about scientific articles from PubMed and MEDLINE. It extracts article metadata, abstracts, author information, MeSH terms, keywords, and publication details into clean, normalized JSON. Returns consistent output -- ready for analysis, literature reviews, or consumption by AI agents via MCP.
Use cases:
- Literature reviews -- find and analyze research on specific topics, drugs, or conditions
- Competitive intelligence -- track competitor publications, clinical trial results, and research directions
- Drug development -- monitor publications related to drug targets, mechanisms, or therapeutic areas
- Academic research -- build datasets for meta-analyses, systematic reviews, or bibliometric studies
- Medical AI/RAG -- populate knowledge bases with biomedical literature for AI applications
- AI agent tooling -- expose as an MCP tool so AI agents can query scientific literature in real time
Input
Choose a scraping mode and provide your search filters.
Mode 1: Search Articles
Search for articles by keyword, author, journal, or MeSH term.
{"mode": "search_articles","query": "CRISPR gene editing","publicationType": "Review","maxResults": 100}
Search with date filters:
{"mode": "search_articles","query": "COVID-19 vaccine efficacy","dateFrom": "2023/01/01","dateTo": "2024/12/31","maxResults": 200}
Mode 2: Get Article
Look up a specific article by PMID.
{"mode": "get_article","pmid": "33243215"}
Mode 3: Search by Author
Find all articles by a specific author.
{"mode": "search_by_author","author": "Zhang F","query": "CRISPR","maxResults": 50}
Mode 4: Search by Journal
Find articles from a specific journal.
{"mode": "search_by_journal","journal": "Nature","query": "immunotherapy","maxResults": 100}
Output
Each article record includes:
| Field | Description |
|---|---|
pmid | PubMed ID |
doi | Digital Object Identifier |
pmc_id | PubMed Central ID (if available) |
title | Article title |
abstract | Full abstract text |
authors | List of authors with affiliations |
journal | Journal name |
journal_abbreviation | ISO journal abbreviation |
volume / issue / pages | Publication details |
publication_date | Publication date |
pub_year | Publication year |
keywords | Author-provided keywords |
mesh_terms | MeSH subject headings |
publication_types | Article types (Review, Clinical Trial, etc.) |
language | Article language |
grant_list | Funding sources |
reference_count | Number of references |
pubmed_url | Link to PubMed page |
doi_url | Link to full text via DOI |
pmc_url | Link to PMC (free full text) |
Example output
{"schema_version": "1.0","type": "article","pmid": "33243215","doi": "10.1038/s41586-020-2649-2","pmc_id": "PMC7901054","title": "CRISPR-Cas9 structures and mechanisms","abstract": "Genome editing with CRISPR-Cas9 has transformed biological research...","authors": [{"last_name": "Doudna","fore_name": "Jennifer A","initials": "JA","affiliation": "University of California, Berkeley"}],"journal": "Annual review of biochemistry","publication_date": "2020 Jun","pub_year": "2020","mesh_terms": ["CRISPR-Cas Systems", "Gene Editing", "RNA, Guide, CRISPR-Cas Systems"],"publication_types": ["Journal Article", "Review"],"pubmed_url": "https://pubmed.ncbi.nlm.nih.gov/33243215/","doi_url": "https://doi.org/10.1038/s41586-020-2649-2"}
Filters
| Filter | Description | Example |
|---|---|---|
query | General search term | "cancer immunotherapy" |
author | Author name | "Smith J" |
journal | Journal name | "Cell" |
meshTerm | MeSH subject heading | "Neoplasms" |
publicationType | Article type | "Clinical Trial" |
dateFrom / dateTo | Date range | "2023/01/01" |
Publication types
- Journal Article
- Review
- Clinical Trial
- Meta-Analysis
- Randomized Controlled Trial
- Case Reports
- Systematic Review
- Letter
- Editorial
Sort options
relevance- Best Match (default)pub_date- Publication Date (newest first)Author- Author name (A-Z)JournalName- Journal name (A-Z)
Integrations
Apify API
curl "https://api.apify.com/v2/acts/labrat011~pubmed-scraper/runs" \-X POST \-H "Authorization: Bearer <APIFY_TOKEN>" \-H "Content-Type: application/json" \-d '{"mode": "search_articles","query": "machine learning drug discovery","maxResults": 50}'
Python
from apify_client import ApifyClientclient = ApifyClient("<APIFY_TOKEN>")run = client.actor("labrat011/pubmed-scraper").call(run_input={"mode": "search_articles","meshTerm": "Alzheimer Disease","publicationType": "Clinical Trial","maxResults": 100})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(f"{item['title']}")print(f" Authors: {', '.join([a['last_name'] for a in item['authors']])}")print(f" Journal: {item['journal']} ({item['pub_year']})")
JavaScript
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: '<APIFY_TOKEN>' });const run = await client.actor('labrat011/pubmed-scraper').call({mode: 'search_by_journal',journal: 'Nature Medicine',query: 'CAR-T',maxResults: 25});const { items } = await client.dataset(run.defaultDatasetId).listItems();items.forEach(item => {console.log(`[${item.pmid}] ${item.title}`);});
MCP Integration
This actor works as an MCP tool through Apify's hosted MCP server. No custom server needed.
- Endpoint:
https://mcp.apify.com?tools=labrat011/pubmed-scraper - Auth:
Authorization: Bearer <APIFY_TOKEN> - Transport: Streamable HTTP
- Works with: Claude Desktop, Cursor, VS Code, Windsurf, Warp, Gemini CLI
Example MCP config (Claude Desktop / Cursor):
{"mcpServers": {"pubmed-scraper": {"url": "https://mcp.apify.com?tools=labrat011/pubmed-scraper","headers": {"Authorization": "Bearer <APIFY_TOKEN>"}}}}
AI agents can use this actor to search medical literature, find research papers, look up clinical trial publications, and retrieve scientific evidence -- all as a callable MCP tool.
Limits and pricing
- Free tier: 25 results per run
- Paid tier: Up to 1,000 results per run
- Rate limiting: Built-in rate limiting (max 3 req/sec per NCBI guidelines)
- No API key required: Uses public NCBI E-utilities
FAQ
What data source does this use?
This actor queries the NCBI E-utilities API, which provides access to PubMed and MEDLINE. PubMed contains over 35 million citations for biomedical literature.
What's the difference between PubMed and MEDLINE?
MEDLINE is the core subset of PubMed containing citations indexed with MeSH terms. PubMed also includes citations from other sources like PMC, NCBI Bookshelf, and publisher-submitted records.
Can I get full text?
This actor returns abstracts and metadata. For full text, check the pmc_url field -- if present, the full text is available free on PubMed Central. Otherwise, use the doi_url to access the publisher's site.
What are MeSH terms?
Medical Subject Headings (MeSH) are controlled vocabulary terms assigned by NLM indexers. They help find articles on specific topics even when authors use different terminology.
How current is the data?
PubMed is updated daily with new citations. This actor retrieves data directly from NCBI, so results reflect the current state of the database.
Is there a rate limit?
NCBI allows 3 requests per second without an API key. This actor includes built-in rate limiting to comply with NCBI guidelines. For higher throughput, consider registering for an NCBI API key.
Resources
- PubMed -- Web interface
- NCBI E-utilities -- API documentation
- MeSH Browser -- Search MeSH terms
- PubMed Help -- Search syntax guide
- Apify Documentation
License
This actor is provided under the MIT License. PubMed data is from the National Library of Medicine (NLM) and is in the public domain.