PubMed Scraper avatar

PubMed Scraper

Pricing

from $0.80 / 1,000 results

Go to Apify Store
PubMed Scraper

PubMed Scraper

Search 35M+ medical citations from PubMed/MEDLINE. Extract articles, abstracts, authors, MeSH terms, and citations for research, competitive intelligence, or AI/RAG pipelines. No API key required.

Pricing

from $0.80 / 1,000 results

Rating

0.0

(0)

Developer

Mick

Mick

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

3 days ago

Last modified

Share

Search and extract medical literature from PubMed/MEDLINE -- articles, abstracts, authors, MeSH terms, and citations. No API key required.

Search the world's largest biomedical literature database with over 35 million citations. Extract structured article data for research, competitive intelligence, or AI/RAG applications.


What does it do?

PubMed Scraper queries the NCBI E-utilities API (ESearch + EFetch) and returns structured data about scientific articles from PubMed and MEDLINE. It extracts article metadata, abstracts, author information, MeSH terms, keywords, and publication details into clean, normalized JSON. Returns consistent output -- ready for analysis, literature reviews, or consumption by AI agents via MCP.

Use cases:

  • Literature reviews -- find and analyze research on specific topics, drugs, or conditions
  • Competitive intelligence -- track competitor publications, clinical trial results, and research directions
  • Drug development -- monitor publications related to drug targets, mechanisms, or therapeutic areas
  • Academic research -- build datasets for meta-analyses, systematic reviews, or bibliometric studies
  • Medical AI/RAG -- populate knowledge bases with biomedical literature for AI applications
  • AI agent tooling -- expose as an MCP tool so AI agents can query scientific literature in real time

Input

Choose a scraping mode and provide your search filters.

Mode 1: Search Articles

Search for articles by keyword, author, journal, or MeSH term.

{
"mode": "search_articles",
"query": "CRISPR gene editing",
"publicationType": "Review",
"maxResults": 100
}

Search with date filters:

{
"mode": "search_articles",
"query": "COVID-19 vaccine efficacy",
"dateFrom": "2023/01/01",
"dateTo": "2024/12/31",
"maxResults": 200
}

Mode 2: Get Article

Look up a specific article by PMID.

{
"mode": "get_article",
"pmid": "33243215"
}

Mode 3: Search by Author

Find all articles by a specific author.

{
"mode": "search_by_author",
"author": "Zhang F",
"query": "CRISPR",
"maxResults": 50
}

Mode 4: Search by Journal

Find articles from a specific journal.

{
"mode": "search_by_journal",
"journal": "Nature",
"query": "immunotherapy",
"maxResults": 100
}

Output

Each article record includes:

FieldDescription
pmidPubMed ID
doiDigital Object Identifier
pmc_idPubMed Central ID (if available)
titleArticle title
abstractFull abstract text
authorsList of authors with affiliations
journalJournal name
journal_abbreviationISO journal abbreviation
volume / issue / pagesPublication details
publication_datePublication date
pub_yearPublication year
keywordsAuthor-provided keywords
mesh_termsMeSH subject headings
publication_typesArticle types (Review, Clinical Trial, etc.)
languageArticle language
grant_listFunding sources
reference_countNumber of references
pubmed_urlLink to PubMed page
doi_urlLink to full text via DOI
pmc_urlLink to PMC (free full text)

Example output

{
"schema_version": "1.0",
"type": "article",
"pmid": "33243215",
"doi": "10.1038/s41586-020-2649-2",
"pmc_id": "PMC7901054",
"title": "CRISPR-Cas9 structures and mechanisms",
"abstract": "Genome editing with CRISPR-Cas9 has transformed biological research...",
"authors": [
{
"last_name": "Doudna",
"fore_name": "Jennifer A",
"initials": "JA",
"affiliation": "University of California, Berkeley"
}
],
"journal": "Annual review of biochemistry",
"publication_date": "2020 Jun",
"pub_year": "2020",
"mesh_terms": ["CRISPR-Cas Systems", "Gene Editing", "RNA, Guide, CRISPR-Cas Systems"],
"publication_types": ["Journal Article", "Review"],
"pubmed_url": "https://pubmed.ncbi.nlm.nih.gov/33243215/",
"doi_url": "https://doi.org/10.1038/s41586-020-2649-2"
}

Filters

FilterDescriptionExample
queryGeneral search term"cancer immunotherapy"
authorAuthor name"Smith J"
journalJournal name"Cell"
meshTermMeSH subject heading"Neoplasms"
publicationTypeArticle type"Clinical Trial"
dateFrom / dateToDate range"2023/01/01"

Publication types

  • Journal Article
  • Review
  • Clinical Trial
  • Meta-Analysis
  • Randomized Controlled Trial
  • Case Reports
  • Systematic Review
  • Letter
  • Editorial

Sort options

  • relevance - Best Match (default)
  • pub_date - Publication Date (newest first)
  • Author - Author name (A-Z)
  • JournalName - Journal name (A-Z)

Integrations

Apify API

curl "https://api.apify.com/v2/acts/labrat011~pubmed-scraper/runs" \
-X POST \
-H "Authorization: Bearer <APIFY_TOKEN>" \
-H "Content-Type: application/json" \
-d '{
"mode": "search_articles",
"query": "machine learning drug discovery",
"maxResults": 50
}'

Python

from apify_client import ApifyClient
client = ApifyClient("<APIFY_TOKEN>")
run = client.actor("labrat011/pubmed-scraper").call(run_input={
"mode": "search_articles",
"meshTerm": "Alzheimer Disease",
"publicationType": "Clinical Trial",
"maxResults": 100
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"{item['title']}")
print(f" Authors: {', '.join([a['last_name'] for a in item['authors']])}")
print(f" Journal: {item['journal']} ({item['pub_year']})")

JavaScript

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: '<APIFY_TOKEN>' });
const run = await client.actor('labrat011/pubmed-scraper').call({
mode: 'search_by_journal',
journal: 'Nature Medicine',
query: 'CAR-T',
maxResults: 25
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach(item => {
console.log(`[${item.pmid}] ${item.title}`);
});

MCP Integration

This actor works as an MCP tool through Apify's hosted MCP server. No custom server needed.

  • Endpoint: https://mcp.apify.com?tools=labrat011/pubmed-scraper
  • Auth: Authorization: Bearer <APIFY_TOKEN>
  • Transport: Streamable HTTP
  • Works with: Claude Desktop, Cursor, VS Code, Windsurf, Warp, Gemini CLI

Example MCP config (Claude Desktop / Cursor):

{
"mcpServers": {
"pubmed-scraper": {
"url": "https://mcp.apify.com?tools=labrat011/pubmed-scraper",
"headers": {
"Authorization": "Bearer <APIFY_TOKEN>"
}
}
}
}

AI agents can use this actor to search medical literature, find research papers, look up clinical trial publications, and retrieve scientific evidence -- all as a callable MCP tool.


Limits and pricing

  • Free tier: 25 results per run
  • Paid tier: Up to 1,000 results per run
  • Rate limiting: Built-in rate limiting (max 3 req/sec per NCBI guidelines)
  • No API key required: Uses public NCBI E-utilities

FAQ

What data source does this use?

This actor queries the NCBI E-utilities API, which provides access to PubMed and MEDLINE. PubMed contains over 35 million citations for biomedical literature.

What's the difference between PubMed and MEDLINE?

MEDLINE is the core subset of PubMed containing citations indexed with MeSH terms. PubMed also includes citations from other sources like PMC, NCBI Bookshelf, and publisher-submitted records.

Can I get full text?

This actor returns abstracts and metadata. For full text, check the pmc_url field -- if present, the full text is available free on PubMed Central. Otherwise, use the doi_url to access the publisher's site.

What are MeSH terms?

Medical Subject Headings (MeSH) are controlled vocabulary terms assigned by NLM indexers. They help find articles on specific topics even when authors use different terminology.

How current is the data?

PubMed is updated daily with new citations. This actor retrieves data directly from NCBI, so results reflect the current state of the database.

Is there a rate limit?

NCBI allows 3 requests per second without an API key. This actor includes built-in rate limiting to comply with NCBI guidelines. For higher throughput, consider registering for an NCBI API key.


Resources


License

This actor is provided under the MIT License. PubMed data is from the National Library of Medicine (NLM) and is in the public domain.