PubMed Scraper avatar

PubMed Scraper

Pricing

from $0.80 / 1,000 results

Go to Apify Store
PubMed Scraper

PubMed Scraper

Search 35M+ medical citations from PubMed/MEDLINE. Extract articles, abstracts, authors, MeSH terms, and citations for research, competitive intelligence, or AI/RAG pipelines. No API key required.

Pricing

from $0.80 / 1,000 results

Rating

0.0

(0)

Developer

mick_

mick_

Maintained by Community

Actor stats

0

Bookmarked

5

Total users

4

Monthly active users

9 days ago

Last modified

Share

Search 35 million biomedical articles from PubMed/MEDLINE. Batch queries, MeSH filtering, full abstracts, author affiliations, and structured metadata — ready for AI pipelines, systematic reviews, and drug research.

Extract structured article data from PubMed via the NCBI E-utilities API. No API key required. Clean JSON output with consistent fields — ready for downstream analysis, RAG pipelines, or LLM tool use.


🚀 What's New (v1.1.0)

Run multiple search queries in a single job. Pass queriesList: ["cancer immunotherapy", "CAR-T therapy", "checkpoint inhibitors"] and get merged, deduplicated results across all queries. One paid run replaces 10+ separate searches — built for systematic reviews and broad topic mapping.

Emoji Input Labels

All input fields now have emoji labels for faster navigation in the Apify console.


🎯 Use Cases

🤖 AI Agents & LLM Pipelines

PubMed is the gold standard for biomedical knowledge. Wire this actor into your AI pipeline as a callable tool:

  • RAG knowledge bases: Index 10,000+ abstracts for semantic search over medical literature
  • MCP tool use: AI agents call this actor in real time to look up evidence, find citations, or answer clinical questions
  • Literature-grounded generation: Pull structured abstracts as context before generating summaries, reports, or clinical content
  • Evidence retrieval: Agent receives a medical question → queries PubMed → returns cited sources

One run can return thousands of structured, citable records ready for embedding. The consistent schema (pmid, abstract, mesh_terms, doi_url) means no cleaning required.

Example — AI agent topic scan:

{
"mode": "search_articles",
"queriesList": ["GLP-1 receptor agonist", "semaglutide weight loss", "tirzepatide clinical trial"],
"publicationType": "Clinical Trial",
"dateFrom": "2022",
"maxResults": 500
}

💊 Pharma & Biotech Research

Track the scientific landscape around drug targets, mechanisms, and therapeutic areas — without manual PubMed sessions:

  • Competitive intelligence: Monitor competitor publications by author, institution, or research area
  • Target validation: Pull all publications on a gene or protein target to assess research depth
  • Clinical trial landscape: Filter by Clinical Trial or RCT publication type to see what's been run
  • Drug repurposing: Cross-reference MeSH terms across therapeutic areas to find unexplored connections
  • Pipeline monitoring: Track publications on specific drugs or mechanisms as they emerge

Example — drug target landscape:

{
"mode": "search_articles",
"meshTerm": "PCSK9",
"publicationType": "Review",
"sort": "pub_date",
"maxResults": 200
}

📑 Systematic Reviews & Meta-Analyses

Systematic reviews require comprehensive literature searches across multiple query formulations. This is where batch search pays off:

  • Multi-query search: Cover all synonyms and related terms in one job — no manual re-running
  • Deduplication: Results automatically deduplicated by PMID — no overlap in your dataset
  • Date-bounded searches: Filter to a specific review window with dateFrom/dateTo
  • Study design filtering: Filter to RCTs, meta-analyses, or systematic reviews only
  • Export ready: Download as CSV or Excel directly from the Apify dataset UI

Example — systematic review search:

{
"mode": "search_articles",
"queriesList": [
"cognitive behavioral therapy depression",
"CBT major depressive disorder",
"psychotherapy depressive symptoms randomized"
],
"publicationType": "Randomized Controlled Trial",
"dateFrom": "2015",
"dateTo": "2024",
"maxResults": 1000
}

📖 Usage Examples

Example 1: Batch Query Search (Systematic Review)

{
"mode": "search_articles",
"queriesList": ["cancer immunotherapy", "CAR-T cell therapy", "PD-1 inhibitor"],
"publicationType": "Clinical Trial",
"maxResults": 500
}
{
"mode": "search_articles",
"query": "CRISPR gene editing",
"publicationType": "Review",
"maxResults": 100
}
{
"mode": "search_articles",
"meshTerm": "Alzheimer Disease",
"publicationType": "Meta-Analysis",
"dateFrom": "2020",
"maxResults": 200
}
{
"mode": "search_by_author",
"author": "Zhang F",
"query": "CRISPR",
"maxResults": 50
}
{
"mode": "search_by_journal",
"journal": "Nature Medicine",
"query": "immunotherapy",
"sort": "pub_date",
"maxResults": 100
}

Example 6: Single Article Lookup

{
"mode": "get_article",
"pmid": "33243215"
}
{
"mode": "search_articles",
"query": "COVID-19 long COVID",
"dateFrom": "2023/01/01",
"dateTo": "2024/12/31",
"sort": "pub_date",
"maxResults": 300
}

👥 Who Uses This

🔬 Systematic Reviewers & Meta-Analysts

Running a systematic review means covering all synonyms and related terms — one query isn't enough. Batch search handles this in a single run.

  • Cover all query variants (queriesList) and get a deduplicated PMID set across all of them
  • Bound by date range to match your review window
  • Filter to the exact study design you need (RCT, Meta-Analysis, Systematic Review)
  • Download as CSV/Excel directly from the Apify dataset UI — no cleaning needed
{
"mode": "search_articles",
"queriesList": [
"cognitive behavioral therapy depression",
"CBT major depressive disorder",
"psychotherapy depressive symptoms"
],
"publicationType": "Randomized Controlled Trial",
"dateFrom": "2015",
"dateTo": "2024",
"maxResults": 1000
}

💊 Pharma & Biotech Teams

Track the publication landscape around a drug target, mechanism, or therapeutic area without manual PubMed sessions.

  • Map all publications on a gene or protein target to assess research depth before investing in it
  • Monitor competitor publications by author, institution, or research area over time
  • Filter to Review or Clinical Trial types to see what stage the evidence is at
  • Track emerging publications with sort: pub_date + date range
{
"mode": "search_articles",
"meshTerm": "PCSK9",
"publicationType": "Clinical Trial",
"sort": "pub_date",
"dateFrom": "2022",
"maxResults": 500
}

🧪 Clinical Researchers

Find evidence for a specific intervention, track a researcher's publication record, or monitor a journal's output.

  • Pull all papers by a specific author to build their publication profile
  • Search a journal combined with a keyword to scope relevant output
  • Use MeSH terms for precise clinical concept matching (avoids terminology variation)
  • Combine author + query to track a researcher's work on a specific topic
{
"mode": "search_by_author",
"author": "Bhatt DL",
"query": "cardiovascular outcomes trial",
"maxResults": 100
}

🤖 AI/LLM Engineers

PubMed is the gold-standard source for biomedical knowledge. Wire this into your pipeline as a callable data source.

  • Index thousands of abstracts into a vector store for semantic retrieval over medical literature
  • Use via MCP so AI agents can look up live evidence during a conversation
  • Pull literature as structured context before generating summaries, clinical notes, or reports
  • The consistent output schema (pmid, abstract, mesh_terms, doi_url) means no field normalization needed downstream
{
"mode": "search_articles",
"queriesList": ["GLP-1 receptor agonist", "semaglutide weight loss", "tirzepatide efficacy"],
"publicationType": "Clinical Trial",
"dateFrom": "2020",
"maxResults": 500
}

📝 Medical Writers & Journalists

Find citations fast, scope coverage on a topic, or pull an author's full bibliography without spending hours on PubMed.

  • Single article lookup by PMID to get structured metadata for citation formatting
  • Author search to build a bibliography or verify publication history
  • Topic search with date range to understand how coverage of a story has evolved
  • pmc_url field points directly to free full text when available
{
"mode": "get_article",
"pmid": "36352213"
}

Other actors from the same portfolio that pair well with PubMed Scraper:

ActorWhat It Does
Clinical Trial Site Contact FinderExtracts principal investigator contacts from ClinicalTrials.gov — useful alongside PubMed author searches
NPI Provider Contact FinderFinds doctor emails, practice websites, and LinkedIn profiles from the NPPES NPI Registry
FDA Drug Labels ScraperPulls structured drug label data from FDA DailyMed — pair with PubMed for full drug research coverage
Academic Paper ScraperResearch papers from ArXiv, Semantic Scholar, and CrossRef — broader coverage beyond biomedical

🔍 Input Parameters

ParameterTypeDefaultDescription
modestringsearch_articlessearch_articles, get_article, search_by_author, search_by_journal
querystring""Single keyword search. Ignored when queriesList is provided
queriesListarray[]NEW — Batch search multiple queries, results merged and deduplicated
pmidstring""PubMed ID for get_article mode
authorstring""Author name (e.g. "Smith J")
journalstring""Journal name (e.g. "Nature", "Cell")
meshTermstring""MeSH subject heading (e.g. "Neoplasms")
publicationTypestring""Filter by study design: Clinical Trial, Review, Meta-Analysis, Randomized Controlled Trial, Systematic Review
dateFromstring""Publication start date: YYYY, YYYY/MM, or YYYY/MM/DD
dateTostring""Publication end date: YYYY, YYYY/MM, or YYYY/MM/DD
sortstringrelevancerelevance, pub_date, Author, JournalName
maxResultsinteger100Max articles to return (1–10,000). Free tier: 10 per run

📊 Output Format

Each article is pushed as a structured JSON record. Download as JSON, CSV, or Excel from the Apify dataset UI.

{
"schema_version": "1.0",
"type": "article",
"pmid": "33243215",
"doi": "10.1038/s41586-020-2649-2",
"pmc_id": "PMC7901054",
"title": "CRISPR-Cas9 structures and mechanisms",
"abstract": "Genome editing with CRISPR-Cas9 has transformed biological research...",
"authors": [
{
"last_name": "Doudna",
"fore_name": "Jennifer A",
"initials": "JA",
"affiliation": "University of California, Berkeley"
}
],
"journal": "Annual review of biochemistry",
"publication_date": "2020 Jun",
"pub_year": "2020",
"mesh_terms": ["CRISPR-Cas Systems", "Gene Editing", "RNA, Guide, CRISPR-Cas Systems"],
"publication_types": ["Journal Article", "Review"],
"keywords": ["CRISPR", "Cas9", "genome editing"],
"grant_list": ["R01 GM081879 (NIH)"],
"reference_count": 142,
"pubmed_url": "https://pubmed.ncbi.nlm.nih.gov/33243215/",
"doi_url": "https://doi.org/10.1038/s41586-020-2649-2",
"pmc_url": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7901054/"
}

Output Fields

FieldDescription
pmidPubMed ID
doi / pmc_idDOI and PubMed Central ID
titleArticle title
abstractFull abstract text
authorsList with last name, first name, initials, affiliation
journal / journal_abbreviationFull and abbreviated journal name
volume / issue / pagesPublication details
publication_date / pub_yearDate and year
keywordsAuthor-provided keywords
mesh_termsMeSH subject headings
publication_typesArticle types (Review, Clinical Trial, etc.)
grant_listFunding sources
reference_countNumber of references
pubmed_url / doi_url / pmc_urlDirect links to PubMed, DOI, and free full text

🤖 MCP Integration

Wire this actor as a tool in your AI agent pipeline via the Apify MCP server. No custom server needed.

  • Endpoint: https://mcp.apify.com?tools=labrat011/pubmed-scraper
  • Auth: Authorization: Bearer <APIFY_TOKEN>
  • Works with: Claude Desktop, Cursor, VS Code, Windsurf, Warp, Gemini CLI

MCP config:

{
"mcpServers": {
"pubmed-scraper": {
"url": "https://mcp.apify.com?tools=labrat011/pubmed-scraper",
"headers": {
"Authorization": "Bearer <APIFY_TOKEN>"
}
}
}
}

Ask your AI: "Find all clinical trials on semaglutide published in the last 2 years and summarize the findings"


💰 Pricing

Pay-per-result pricing — you only pay for what you scrape.

TierLimitUse case
Free10 results/runTesting and integration verification
PaidUp to 10,000 results/runSystematic reviews, RAG pipelines, bulk research

❓ FAQ

Can I get full text?

This actor returns abstracts and metadata. Check the pmc_url field — if present, the full text is freely available on PubMed Central. Otherwise use doi_url to reach the publisher.

What are MeSH terms?

Medical Subject Headings (MeSH) are controlled vocabulary terms assigned by NLM indexers. They help find articles on specific topics even when authors use different terminology. Use the MeSH Browser to find the right term.

How current is the data?

PubMed is updated daily. This actor queries NCBI directly, so results reflect the current state of the database.

Is there a rate limit?

NCBI allows 3 requests/second without an API key. Rate limiting is built in — you don't need to configure anything.

What's the difference between query and queriesList?

query runs a single search. queriesList runs multiple searches in one job and merges the results, deduplicating by PMID. Use queriesList for systematic reviews where you need to cover multiple synonyms or related terms.


📚 Resources


Built for researchers, AI engineers, and data scientists who need biomedical literature at scale.