PubMed Article Search avatar

PubMed Article Search

Pricing

Pay per event

Go to Apify Store
PubMed Article Search

PubMed Article Search

Search PubMed's 35M+ biomedical articles and extract structured data: titles, authors, abstracts, DOIs, MeSH keywords, and publication types. Free NCBI API, no key required.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Stas Persiianenko

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Categories

Share

Search PubMed -- NCBI's free biomedical literature database with 35M+ peer-reviewed articles -- and extract structured data including titles, authors, abstracts, DOIs, MeSH keywords, and publication types. No API key required. Uses the official NCBI E-utilities API for reliable, high-volume data extraction.

What Does PubMed Scraper Do?

This actor searches PubMed using the official NCBI E-utilities API and extracts structured article metadata. Enter one or more search queries using PubMed's full search syntax (Boolean operators, MeSH terms, field tags, date ranges), and get back clean, structured JSON data for every matching article.

  • Searches PubMed's 35M+ article database with full query syntax support
  • Extracts title, authors with affiliations, abstract, journal, DOI, MeSH keywords, and publication types
  • Handles multiple queries with automatic deduplication across results
  • Processes up to 10,000 articles per query with batch fetching
  • Respects NCBI rate limits (3 requests/second) for reliable operation
  • No API key or proxy required -- NCBI E-utilities is a free public API

Who Is PubMed Scraper For?

  • Pharmaceutical researchers tracking clinical trials, drug efficacy studies, and safety reports for specific compounds or therapeutic areas
  • Academic researchers conducting systematic reviews or meta-analyses who need bulk article metadata for literature mapping
  • Biotech and life sciences companies monitoring competitor research output and publication trends in target disease areas
  • Healthcare analytics teams analyzing publication patterns, emerging treatments, and research funding trends
  • Grant writers gathering supporting references and citation data for proposals
  • Science journalists finding primary sources for medical reporting and fact-checking
  • Data scientists building datasets of scientific literature for NLP training, citation analysis, or knowledge graphs
  • Medical librarians curating research collections and tracking institutional publication output
  • Bulk extraction -- export hundreds or thousands of articles in minutes, not hours of manual copying
  • Structured data -- get clean JSON with consistent field names, ready for analysis or database import
  • Multiple queries -- run several searches in one go with automatic deduplication
  • Automation-ready -- schedule recurring searches to monitor new publications on any topic
  • API integration -- pipe results directly into your data pipeline, spreadsheet, or database

What Data Is Extracted?

Each article record includes these fields:

FieldTypeDescription
pmidstringPubMed ID -- unique identifier for the article
titlestringFull article title
authorsarrayAuthor objects with name and affiliation
abstractstringFull abstract text (structured abstracts preserved with section labels)
journalstringJournal name
publicationDatestringPublication date (YYYY-MM-DD, YYYY-MM, or YYYY)
doistringDigital Object Identifier for cross-referencing
keywordsarrayMeSH terms and author-assigned keywords
publicationTypesarrayTypes: Journal Article, Review, Clinical Trial, Meta-Analysis, etc.
urlstringDirect link to the article on PubMed
scrapedAtstringISO timestamp when the article was scraped

How Much Does It Cost to Scrape PubMed Articles?

This actor uses pay-per-event (PPE) pricing. You only pay for what you use:

EventFREE tierGOLD tierDIAMOND tier
Run started (one-time)$0.001$0.001$0.001
Per article scraped$0.002$0.001$0.0005

Example costs at FREE tier:

  • 50 articles: ~$0.10
  • 200 articles: ~$0.40
  • 1,000 articles: ~$2.00
  • 5,000 articles: ~$10.00

Higher subscription tiers get progressively lower per-article rates. The Apify Free plan includes $5/month of platform credits -- enough to scrape ~2,500 articles per month at no cost.

How to Scrape PubMed Articles

  1. Go to the PubMed Scraper actor page on Apify Store
  2. Click Try for free to open the actor in Apify Console
  3. Enter your search queries in the Search Queries field (one per line)
  4. Set Max results per query (default: 50, max: 10,000)
  5. Optionally set date filters to limit results to a specific time range
  6. Click Start and wait for the run to complete
  7. Download your results as JSON, CSV, or Excel from the Dataset tab

Input Parameters

Search Queries (required)

Enter one or more PubMed search queries. Each query runs as a separate search, and results are deduplicated across queries. Supports full PubMed search syntax including Boolean operators, field tags, and MeSH terms.

Example queries:

CRISPR cancer therapy
COVID-19[Title] AND vaccine efficacy
diabetes AND insulin resistance[MeSH Terms]
"breast cancer"[Title/Abstract] AND 2020:2024[pdat]
Smith J[Author] AND Nature[Journal]

Max Results Per Query

Default: 50. Maximum: 10,000. PubMed indexes 35M+ articles -- use date filters to narrow results for broad topics.

Sort By

  • Relevance (default) -- best match articles appear first
  • Date -- most recently published articles appear first

Date Filters

Filter articles by publication date range. Format: YYYY/MM/DD.

  • dateFrom: 2020/01/01 -- articles published since January 2020
  • dateTo: 2024/12/31 -- articles published before end of 2024

Output Example

{
"pmid": "38123456",
"title": "CRISPR-Cas9-mediated gene editing in primary T cells for cancer immunotherapy",
"authors": [
{
"name": "Zhang Wei",
"affiliation": "Department of Immunology, Harvard Medical School, Boston, MA"
},
{
"name": "Johnson Sarah K",
"affiliation": "Dana-Farber Cancer Institute, Boston, MA"
}
],
"abstract": "BACKGROUND: Gene editing of T cells has emerged as a promising approach...\n\nMETHODS: We performed CRISPR-Cas9 editing of primary human T cells...\n\nRESULTS: Edited T cells showed enhanced tumor infiltration...",
"journal": "Nature Medicine",
"publicationDate": "2023-11-15",
"doi": "10.1038/s41591-023-02345-6",
"keywords": ["CRISPR-Cas9 Protein Systems", "T-Lymphocytes", "Neoplasms", "Immunotherapy"],
"publicationTypes": ["Journal Article", "Research Support, N.I.H., Extramural"],
"url": "https://pubmed.ncbi.nlm.nih.gov/38123456/",
"scrapedAt": "2024-03-15T10:23:45.123Z"
}

Tips for Better Results

  • Use MeSH terms for precise results -- Neoplasms[MeSH Terms] captures all cancer types regardless of terminology
  • Combine Boolean operators -- (diabetes OR "metabolic syndrome") AND treatment[Title]
  • Use field tags to narrow scope -- [Title], [Author], [Journal], [Affiliation], [MeSH Terms]
  • Quote exact phrases -- "randomized controlled trial" matches the exact phrase
  • Date ranges in query -- append AND 2020:2024[pdat] to limit publication dates
  • Start small -- test with 10-20 results first, then scale up once you confirm the query returns relevant articles
  • Multiple focused queries beat one broad query -- run 3 specific queries of 100 articles rather than 1 generic query of 300

PubMed Search Syntax Guide

Boolean Operators

diabetes AND insulin -- both terms required
cancer OR tumor -- either term
COVID-19 NOT influenza -- exclude term

Field Tags

[Title] -- title only
[Author] -- author name
[Journal] -- specific journal
[MeSH Terms] -- Medical Subject Headings
[Title/Abstract] -- title or abstract
[Affiliation] -- institution

MeSH Terms

MeSH (Medical Subject Headings) is NLM's controlled vocabulary. Using MeSH gives more precise results:

Neoplasms[MeSH Terms] -- all cancer types
Diabetes Mellitus[MeSH Terms]
COVID-19[MeSH Terms]

Combined Example

("breast cancer"[Title] OR "mammary tumor"[Title]) AND treatment[MeSH Terms] AND 2022:2024[pdat]

Integrations

PubMed Scraper connects to 1,500+ apps through the Apify platform:

  • Google Sheets -- automatically export new articles to a shared spreadsheet for team literature tracking
  • Slack / Microsoft Teams -- get notified when new articles match your monitoring queries via scheduled runs
  • Airtable -- build a searchable research database with article metadata, tags, and review status
  • Zapier / Make -- trigger workflows when new publications appear (e.g., email digest to research team)
  • PostgreSQL / BigQuery -- pipe article data into your data warehouse for bibliometric analysis
  • n8n -- build automated literature monitoring pipelines with custom filtering and alerting

Set up integrations in the Apify Console under Settings > Integrations for any actor run.

API Usage

Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('automation-lab/pubmed-scraper').call({
queries: ['CRISPR cancer therapy', 'CAR-T cell therapy'],
maxResultsPerQuery: 100,
sortBy: 'date',
dateFrom: '2023/01/01',
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Scraped ${items.length} articles`);

Python

from apify_client import ApifyClient
client = ApifyClient(token="YOUR_API_TOKEN")
run = client.actor("automation-lab/pubmed-scraper").call(run_input={
"queries": ["CRISPR cancer therapy"],
"maxResultsPerQuery": 200,
"sortBy": "date",
"dateFrom": "2022/01/01",
"dateTo": "2024/12/31",
})
items = client.dataset(run["defaultDatasetId"]).list_items().items
print(f"Scraped {len(items)} articles")

cURL

# Start the actor run
curl -X POST "https://api.apify.com/v2/acts/automation-lab~pubmed-scraper/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"queries": ["CRISPR cancer therapy"],
"maxResultsPerQuery": 50,
"sortBy": "relevance"
}'
# Fetch results (replace DATASET_ID from the run response)
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN&format=json"

MCP Integration (AI Assistants)

Use PubMed Scraper directly from AI assistants like Claude, ChatGPT, or Cursor through the Model Context Protocol (MCP).

Claude Desktop / Claude Code

Add to your MCP config (claude_desktop_config.json or .mcp.json):

{
"mcpServers": {
"apify": {
"command": "npx",
"args": ["-y", "@anthropic/apify-mcp-server"],
"env": {
"APIFY_TOKEN": "your-apify-token"
}
}
}
}

Example Prompts

  • "Search PubMed for recent CRISPR cancer therapy papers from 2024 and summarize the top 20"
  • "Find all clinical trials on mRNA vaccines published this year and create a comparison table"
  • "Get the latest 50 papers on Alzheimer's disease biomarkers and extract key findings"

Yes. This actor uses the official NCBI E-utilities API, which is a free public API provided by the National Library of Medicine specifically for programmatic access to PubMed data. The API is designed for bulk data retrieval and does not require authentication. Article metadata (titles, abstracts, authors, DOIs) is public information. This actor respects NCBI's rate limits (3 requests/second). For NCBI's usage policies, see the E-utilities documentation.

Frequently Asked Questions

Do I need an API key to use this actor?

No. The NCBI E-utilities API is free and public. No registration or API key is required. This actor handles all API communication automatically.

How many articles can I retrieve in one run?

PubMed's API supports up to 10,000 results per query. For broader topics, use multiple focused queries with date range filters. There is no limit on the number of queries per run.

What if my search returns 0 results?

Check your query syntax -- PubMed search is case-insensitive but sensitive to field tag formatting. Make sure brackets are correct (e.g., [MeSH Terms] not [mesh terms]). Try broadening your search by removing field tags or date filters. The actor handles 0-result queries gracefully without errors.

Why are some articles missing abstracts?

Not all PubMed articles have abstracts. Letters, editorials, and some older articles may lack abstract text. The abstract field will be null for these articles.

Does this actor retrieve full article text?

No -- PubMed abstracts are freely available, but full article text is behind publisher paywalls for most journals. Use the DOI to access the publisher site or check PubMed Central (PMC) for open-access articles.

How current is the data?

PubMed is updated daily. Newly accepted articles typically appear within a few weeks of online publication.

  • Semantic Scholar Scraper -- search and extract data from Semantic Scholar's academic paper database
  • OpenAlex Scraper -- extract scholarly article data from the OpenAlex open research database
  • CrossRef Scraper -- search CrossRef for DOI metadata, citations, and publication details