PubMed Article Search
Pricing
Pay per event
PubMed Article Search
Search PubMed's 35M+ biomedical articles and extract structured data: titles, authors, abstracts, DOIs, MeSH keywords, and publication types. Free NCBI API, no key required.
Pricing
Pay per event
Rating
0.0
(0)
Developer
Stas Persiianenko
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Search PubMed -- NCBI's free biomedical literature database with 35M+ peer-reviewed articles -- and extract structured data including titles, authors, abstracts, DOIs, MeSH keywords, and publication types. No API key required. Uses the official NCBI E-utilities API for reliable, high-volume data extraction.
What Does PubMed Scraper Do?
This actor searches PubMed using the official NCBI E-utilities API and extracts structured article metadata. Enter one or more search queries using PubMed's full search syntax (Boolean operators, MeSH terms, field tags, date ranges), and get back clean, structured JSON data for every matching article.
- Searches PubMed's 35M+ article database with full query syntax support
- Extracts title, authors with affiliations, abstract, journal, DOI, MeSH keywords, and publication types
- Handles multiple queries with automatic deduplication across results
- Processes up to 10,000 articles per query with batch fetching
- Respects NCBI rate limits (3 requests/second) for reliable operation
- No API key or proxy required -- NCBI E-utilities is a free public API
Who Is PubMed Scraper For?
- Pharmaceutical researchers tracking clinical trials, drug efficacy studies, and safety reports for specific compounds or therapeutic areas
- Academic researchers conducting systematic reviews or meta-analyses who need bulk article metadata for literature mapping
- Biotech and life sciences companies monitoring competitor research output and publication trends in target disease areas
- Healthcare analytics teams analyzing publication patterns, emerging treatments, and research funding trends
- Grant writers gathering supporting references and citation data for proposals
- Science journalists finding primary sources for medical reporting and fact-checking
- Data scientists building datasets of scientific literature for NLP training, citation analysis, or knowledge graphs
- Medical librarians curating research collections and tracking institutional publication output
Why Use PubMed Scraper Instead of Manual Search?
- Bulk extraction -- export hundreds or thousands of articles in minutes, not hours of manual copying
- Structured data -- get clean JSON with consistent field names, ready for analysis or database import
- Multiple queries -- run several searches in one go with automatic deduplication
- Automation-ready -- schedule recurring searches to monitor new publications on any topic
- API integration -- pipe results directly into your data pipeline, spreadsheet, or database
What Data Is Extracted?
Each article record includes these fields:
| Field | Type | Description |
|---|---|---|
pmid | string | PubMed ID -- unique identifier for the article |
title | string | Full article title |
authors | array | Author objects with name and affiliation |
abstract | string | Full abstract text (structured abstracts preserved with section labels) |
journal | string | Journal name |
publicationDate | string | Publication date (YYYY-MM-DD, YYYY-MM, or YYYY) |
doi | string | Digital Object Identifier for cross-referencing |
keywords | array | MeSH terms and author-assigned keywords |
publicationTypes | array | Types: Journal Article, Review, Clinical Trial, Meta-Analysis, etc. |
url | string | Direct link to the article on PubMed |
scrapedAt | string | ISO timestamp when the article was scraped |
How Much Does It Cost to Scrape PubMed Articles?
This actor uses pay-per-event (PPE) pricing. You only pay for what you use:
| Event | FREE tier | GOLD tier | DIAMOND tier |
|---|---|---|---|
| Run started (one-time) | $0.001 | $0.001 | $0.001 |
| Per article scraped | $0.002 | $0.001 | $0.0005 |
Example costs at FREE tier:
- 50 articles: ~$0.10
- 200 articles: ~$0.40
- 1,000 articles: ~$2.00
- 5,000 articles: ~$10.00
Higher subscription tiers get progressively lower per-article rates. The Apify Free plan includes $5/month of platform credits -- enough to scrape ~2,500 articles per month at no cost.
How to Scrape PubMed Articles
- Go to the PubMed Scraper actor page on Apify Store
- Click Try for free to open the actor in Apify Console
- Enter your search queries in the Search Queries field (one per line)
- Set Max results per query (default: 50, max: 10,000)
- Optionally set date filters to limit results to a specific time range
- Click Start and wait for the run to complete
- Download your results as JSON, CSV, or Excel from the Dataset tab
Input Parameters
Search Queries (required)
Enter one or more PubMed search queries. Each query runs as a separate search, and results are deduplicated across queries. Supports full PubMed search syntax including Boolean operators, field tags, and MeSH terms.
Example queries:
CRISPR cancer therapyCOVID-19[Title] AND vaccine efficacydiabetes AND insulin resistance[MeSH Terms]"breast cancer"[Title/Abstract] AND 2020:2024[pdat]Smith J[Author] AND Nature[Journal]
Max Results Per Query
Default: 50. Maximum: 10,000. PubMed indexes 35M+ articles -- use date filters to narrow results for broad topics.
Sort By
- Relevance (default) -- best match articles appear first
- Date -- most recently published articles appear first
Date Filters
Filter articles by publication date range. Format: YYYY/MM/DD.
dateFrom: 2020/01/01-- articles published since January 2020dateTo: 2024/12/31-- articles published before end of 2024
Output Example
{"pmid": "38123456","title": "CRISPR-Cas9-mediated gene editing in primary T cells for cancer immunotherapy","authors": [{"name": "Zhang Wei","affiliation": "Department of Immunology, Harvard Medical School, Boston, MA"},{"name": "Johnson Sarah K","affiliation": "Dana-Farber Cancer Institute, Boston, MA"}],"abstract": "BACKGROUND: Gene editing of T cells has emerged as a promising approach...\n\nMETHODS: We performed CRISPR-Cas9 editing of primary human T cells...\n\nRESULTS: Edited T cells showed enhanced tumor infiltration...","journal": "Nature Medicine","publicationDate": "2023-11-15","doi": "10.1038/s41591-023-02345-6","keywords": ["CRISPR-Cas9 Protein Systems", "T-Lymphocytes", "Neoplasms", "Immunotherapy"],"publicationTypes": ["Journal Article", "Research Support, N.I.H., Extramural"],"url": "https://pubmed.ncbi.nlm.nih.gov/38123456/","scrapedAt": "2024-03-15T10:23:45.123Z"}
Tips for Better Results
- Use MeSH terms for precise results --
Neoplasms[MeSH Terms]captures all cancer types regardless of terminology - Combine Boolean operators --
(diabetes OR "metabolic syndrome") AND treatment[Title] - Use field tags to narrow scope --
[Title],[Author],[Journal],[Affiliation],[MeSH Terms] - Quote exact phrases --
"randomized controlled trial"matches the exact phrase - Date ranges in query -- append
AND 2020:2024[pdat]to limit publication dates - Start small -- test with 10-20 results first, then scale up once you confirm the query returns relevant articles
- Multiple focused queries beat one broad query -- run 3 specific queries of 100 articles rather than 1 generic query of 300
PubMed Search Syntax Guide
Boolean Operators
diabetes AND insulin -- both terms requiredcancer OR tumor -- either termCOVID-19 NOT influenza -- exclude term
Field Tags
[Title] -- title only[Author] -- author name[Journal] -- specific journal[MeSH Terms] -- Medical Subject Headings[Title/Abstract] -- title or abstract[Affiliation] -- institution
MeSH Terms
MeSH (Medical Subject Headings) is NLM's controlled vocabulary. Using MeSH gives more precise results:
Neoplasms[MeSH Terms] -- all cancer typesDiabetes Mellitus[MeSH Terms]COVID-19[MeSH Terms]
Combined Example
("breast cancer"[Title] OR "mammary tumor"[Title]) AND treatment[MeSH Terms] AND 2022:2024[pdat]
Integrations
PubMed Scraper connects to 1,500+ apps through the Apify platform:
- Google Sheets -- automatically export new articles to a shared spreadsheet for team literature tracking
- Slack / Microsoft Teams -- get notified when new articles match your monitoring queries via scheduled runs
- Airtable -- build a searchable research database with article metadata, tags, and review status
- Zapier / Make -- trigger workflows when new publications appear (e.g., email digest to research team)
- PostgreSQL / BigQuery -- pipe article data into your data warehouse for bibliometric analysis
- n8n -- build automated literature monitoring pipelines with custom filtering and alerting
Set up integrations in the Apify Console under Settings > Integrations for any actor run.
API Usage
Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('automation-lab/pubmed-scraper').call({queries: ['CRISPR cancer therapy', 'CAR-T cell therapy'],maxResultsPerQuery: 100,sortBy: 'date',dateFrom: '2023/01/01',});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(`Scraped ${items.length} articles`);
Python
from apify_client import ApifyClientclient = ApifyClient(token="YOUR_API_TOKEN")run = client.actor("automation-lab/pubmed-scraper").call(run_input={"queries": ["CRISPR cancer therapy"],"maxResultsPerQuery": 200,"sortBy": "date","dateFrom": "2022/01/01","dateTo": "2024/12/31",})items = client.dataset(run["defaultDatasetId"]).list_items().itemsprint(f"Scraped {len(items)} articles")
cURL
# Start the actor runcurl -X POST "https://api.apify.com/v2/acts/automation-lab~pubmed-scraper/runs?token=YOUR_API_TOKEN" \-H "Content-Type: application/json" \-d '{"queries": ["CRISPR cancer therapy"],"maxResultsPerQuery": 50,"sortBy": "relevance"}'# Fetch results (replace DATASET_ID from the run response)curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN&format=json"
MCP Integration (AI Assistants)
Use PubMed Scraper directly from AI assistants like Claude, ChatGPT, or Cursor through the Model Context Protocol (MCP).
Claude Desktop / Claude Code
Add to your MCP config (claude_desktop_config.json or .mcp.json):
{"mcpServers": {"apify": {"command": "npx","args": ["-y", "@anthropic/apify-mcp-server"],"env": {"APIFY_TOKEN": "your-apify-token"}}}}
Example Prompts
- "Search PubMed for recent CRISPR cancer therapy papers from 2024 and summarize the top 20"
- "Find all clinical trials on mRNA vaccines published this year and create a comparison table"
- "Get the latest 50 papers on Alzheimer's disease biomarkers and extract key findings"
Is It Legal to Scrape PubMed?
Yes. This actor uses the official NCBI E-utilities API, which is a free public API provided by the National Library of Medicine specifically for programmatic access to PubMed data. The API is designed for bulk data retrieval and does not require authentication. Article metadata (titles, abstracts, authors, DOIs) is public information. This actor respects NCBI's rate limits (3 requests/second). For NCBI's usage policies, see the E-utilities documentation.
Frequently Asked Questions
Do I need an API key to use this actor?
No. The NCBI E-utilities API is free and public. No registration or API key is required. This actor handles all API communication automatically.
How many articles can I retrieve in one run?
PubMed's API supports up to 10,000 results per query. For broader topics, use multiple focused queries with date range filters. There is no limit on the number of queries per run.
What if my search returns 0 results?
Check your query syntax -- PubMed search is case-insensitive but sensitive to field tag formatting. Make sure brackets are correct (e.g., [MeSH Terms] not [mesh terms]). Try broadening your search by removing field tags or date filters. The actor handles 0-result queries gracefully without errors.
Why are some articles missing abstracts?
Not all PubMed articles have abstracts. Letters, editorials, and some older articles may lack abstract text. The abstract field will be null for these articles.
Does this actor retrieve full article text?
No -- PubMed abstracts are freely available, but full article text is behind publisher paywalls for most journals. Use the DOI to access the publisher site or check PubMed Central (PMC) for open-access articles.
How current is the data?
PubMed is updated daily. Newly accepted articles typically appear within a few weeks of online publication.
Related Scrapers
- Semantic Scholar Scraper -- search and extract data from Semantic Scholar's academic paper database
- OpenAlex Scraper -- extract scholarly article data from the OpenAlex open research database
- CrossRef Scraper -- search CrossRef for DOI metadata, citations, and publication details