Under maintenance

Pricing

Pay per event

Try for free

Go to Apify Store

PubMed Scraper — Papers, DOI & MeSH to JSON

Under maintenance

Try for free

Search PubMed by query and export structured paper rows — title, authors, abstract, journal, DOI, PMID, MeSH terms, publication date — to JSON or CSV. A clean PubMed API wrapper that handles NCBI pagination, rate limits, and retries for research and ML pipelines.

Pricing

Pay per event

Rating

0.0

(0)

Developer

DevilScrapes

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

🎯 What this scrapes

NCBI's E-utilities is the canonical gateway to PubMed's 36+ million records — and it punishes naive callers with hard rate limits, chained esearch → efetch calls, XML quirks, and intermittent 500s. This pubmed scraper turns a free-form search query into a fully typed dataset (PMID, title, authors, abstract, journal, DOI, MeSH terms, publication types, author-supplied keywords, full citation URL) and absorbs every piece of upstream friction: paged fetches, backoff on 429s, transient-error retries, XML-to-JSON coercion. Provide your NCBI API key to lift throughput from 3 req/s to 10 req/s; either way the rows come out identical — clean and consistent.

This is a research and metadata tool only — abstracts, titles, identifiers, and controlled vocabulary. It never touches patient records, never claims to be clinical decision support, and deliberately does not fetch full text (full-text licensing lives on the publisher's side, not PubMed's). We scrape what PubMed openly indexes.

🔥 Features

🛡️ Browser fingerprint rotation — curl-cffi impersonates real Chrome / Firefox / Safari TLS handshakes so the target sees a browser, not Python.
🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP on every block.
🔁 Retries with exponential backoff on 408 / 429 / 5xx — up to 5 attempts per page, Retry-After honoured.
🧱 Rate-limit-aware pacing — when NCBI pushes back, we slow down, surface a status message, and keep going. You never get a silent empty dataset.
🧊 Clean, typed dataset rows — Pydantic-validated, ISO-8601 timestamps, stable IDs, JSON / CSV / Excel export straight from the Apify Console.
💰 Pay-Per-Event pricing — you only pay for results that hit your dataset. No data, no charge.

💡 Use cases

Clinical RAG pipelines — pull fresh PubMed metadata on a schedule and embed abstracts into a vector store for a medical-literature chatbot or pharmacovigilance alert.
Literature reviews and meta-analyses — retrieve every paper matching a topic + date range in one run; export to CSV for your review management tool.
Pharma competitive intel — track new mentions of a drug, compound, or trial ID across PubMed as they appear.
Author publication monitoring — daily [Author] diff to feed a personal or departmental RSS-style alert.
MeSH-based corpus assembly — extract every paper tagged with specific MeSH headings to build a training corpus or annotation benchmark.
Bulk PubMed dataset download — run a broad query (e.g. "CRISPR"[MeSH] AND 2020:2025[PDat]) and export thousands of records in a single job.

⚙️ How to use it

Click Try for free at the top of the page.
Fill in the input form — searchQuery is the only required field; everything else has sensible defaults.
Click Start. Output streams into the run's dataset in real time.
Export from Storage → Dataset as JSON, CSV, or Excel — or pull via the Apify API.

Need a repeating feed? Wire a Schedule to the Actor. Each run picks up new results; combine with a named dataset to build an append-only archive.

📥 Input

Field	Type	Required	Default	Notes
`searchQuery`	`string`	yes	`diabetes mellitus type 2 review`	PubMed-style query. Field tags like `[Author]`, `[Title]`, `[MeSH]`, `[PDat]` are fully supported.
`maxResults`	`integer`	no	`30`	Total PubMed records to fetch. No hard cap — set a large number for bulk pubmed dataset downloads.
`sortBy`	`string`	no	`most_recent`	Field used to order results. Accepts any value supported by the E-utilities `sort` parameter.
`apiKey`	`string`	no	—	NCBI API key. Lifts rate limit from 3 req/s to 10 req/s.
`proxyConfiguration`	`object`	no	`{"useApifyProxy": false}`	NCBI does not IP-block under standard use, so proxy is optional. Residential proxies are available if your environment requires them.

Example input

{
  "searchQuery": "crispr review 2024",
  "maxResults": 50,
  "sortBy": "most_recent",
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}

📤 Output

Every row is one PubMed record. All fields are Pydantic-validated before they hit your dataset.

Field	Type	Notes
`pmid`	`string`	PubMed ID — the canonical stable identifier.
`pmcid`	`string \| null`	PubMed Central ID when the record is in PMC.
`doi`	`string \| null`	Digital Object Identifier.
`title`	`string`	Full paper title.
`abstract`	`string \| null`	Abstract text, including structured headings where present.
`authors`	`array`	Author names (Last F format), preserving original order.
`journal`	`string \| null`	Journal full name.
`journal_iso`	`string \| null`	Journal ISO abbreviation (e.g. `Nat Rev Drug Discov`).
`publication_types`	`array`	Publication-type labels e.g. `Review`, `Journal Article`, `Clinical Trial`.
`mesh_terms`	`array`	MeSH headings assigned by NCBI indexers.
`keywords`	`array`	Author-supplied keywords.
`pub_date`	`string \| null`	Best-available publication date — ISO-8601 (`2024-03-01`) or year-only when that is all PubMed records.
`pubmed_url`	`string`	Canonical PubMed URL for this record.
`scraped_at`	`string`	ISO-8601 timestamp of when this row was recorded.

Example output

{
  "pmid": "39000123",
  "pmcid": "PMC11234567",
  "doi": "10.1038/s41573-024-00901-3",
  "title": "Advances in CRISPR-Cas12a therapeutics — a 2024 review",
  "abstract": "CRISPR-based gene editing has matured rapidly ...",
  "authors": ["Smith J", "Patel R", "Chen W"],
  "journal": "Nature Reviews Drug Discovery",
  "journal_iso": "Nat Rev Drug Discov",
  "publication_types": ["Review", "Journal Article"],
  "mesh_terms": ["CRISPR-Cas Systems", "Gene Editing", "Therapeutics"],
  "keywords": ["CRISPR", "gene therapy", "Cas12a"],
  "pub_date": "2024-03-01",
  "pubmed_url": "https://pubmed.ncbi.nlm.nih.gov/39000123/",
  "scraped_at": "2026-06-01T09:12:00Z"
}

💰 Pricing

Pay-Per-Event — you pay only when these events fire:

Event	USD	What it is
`actor-start`	$0.005	One-off warm-up charge per run
`result`	$0.002	Per dataset item added

Example: 1 000 results ≈ $2.00. No subscription, no monthly minimum, no card to start — every new Apify account gets $5 of free credit.

🚧 Limitations

Metadata only — this Actor hits E-utilities (esearch + efetch). Full text lives on publisher sites and is out of scope. The pmcid field gives you a pointer to PubMed Central when the paper is openly available there.
Citation graphs — which papers cite which — are not in scope. Use the iCite API for that.
Older records — some fields (especially abstract, doi, mesh_terms) may be absent for pre-1970 records. The Actor surfaces null rather than fabricating data.
NCBI rate limits — the Actor honours NCBI's stated quota (3 req/s without an API key, 10 req/s with one). We will not race past these limits; doing so gets the entire endpoint burned for everyone. Provide an apiKey for high-volume jobs.
Patient data and PHI — PubMed indexes abstracts and metadata only. There is no patient data here, and this tool must not be used as clinical decision support.

❓ FAQ

Do I need an NCBI API key to run this pubmed scraper?

No — without one you get ~3 req/s, which is enough for queries returning up to a few thousand records in a reasonable time. With an API key you lift to 10 req/s. Get yours free at the NCBI account portal.

Is this a pubmed api wrapper I can call programmatically?

Yes — once the Actor runs, the output dataset is accessible via the Apify REST API (JSON / NDJSON / CSV / XLSX). You can also trigger runs via API and poll for completion. See the Apify API docs for details.

Can I do a pubmed bulk download — thousands of records?

Yes. Set maxResults to however many records you need. The Actor pages through E-utilities results and streams rows into your dataset as it goes. For very large jobs, provide an apiKey to get the 10 req/s quota.

Can I filter by date range?

Use the [PDat] qualifier in your searchQuery — e.g. "COVID-19"[MeSH] AND 2020:2024[PDat]. NCBI's Entrez query syntax is documented here.

Why are some abstracts empty?

PubMed does not always store abstract text — older records, letters, and some conference papers are abstract-less. The Actor returns null for missing fields rather than inserting placeholder text.

What about full text and the clinical literature search API?

Full text lives on the publisher's site. When a paper is openly available in PubMed Central, the pmcid field gives you the identifier to fetch it directly from PMC. For an integrated clinical literature search API experience, pair this Actor with your own embedding pipeline — the output schema is designed to drop straight into LangChain's Document format.

Does this handle retracted papers?

PubMed keeps retracted papers in the index with a "Retraction of Publication" publication type. The Actor surfaces the publication_types array so you can filter these out downstream.

💬 Your feedback

Spotted a bug, hit a weird edge case, or need a new field? Open an issue on the Actor's Issues tab on Apify Console — we ship fixes weekly and we read every report.

PubMed Scraper — Abstracts, Authors & MeSH Terms

logiover/pubmed-scraper

Scrape PubMed by keyword query or direct PMIDs. Extract title, abstract, authors, journal, DOI, MeSH terms, keywords, and publication date via NCBI E-utilities. No API key required.

Logiover

PubMed Search Scraper

automation-lab/pubmed-search-scraper

Search PubMed via the official NCBI API and extract article metadata, abstracts, DOI, authors, journals, MeSH terms, and keywords.

Stas Persiianenko

PubMed Search Scraper

fetch_cat/pubmed-search-scraper

Search PubMed and export public article metadata, abstracts, authors, journals, DOI, MeSH terms, and keywords.

Hanna Nosova

PubMed Search Scraper

crawlerbros/pubmed-search-scraper

Search PubMed (NCBI E-utilities) for biomedical articles by keyword, date range, and article type. Returns title, authors, journal, abstract, DOI, MeSH terms, keywords, and citation. Free public API, no proxy, no cookies. Optional NCBI API key for higher rate limits.

Crawler Bros

PubMed Biomedical Literature Scraper

meticulous_sweetwilliam/pubmed-biomedical-literature

Query PubMed via NCBI API for biomedical papers. Extract title, authors, abstract, MeSH terms, DOI, PMID. For pharma R&D, biotech, medical AI pipelines, and systematic reviews.

Leo

Pubmed Research Scraper

fortuitous_pirate/pubmed-research-scraper

Search and extract biomedical research papers from PubMed (NCBI). Filter by keyword, journal, author, or date range. Returns paper title, authors, journal, publication date, DOI, and citation count. Free NCBI API — no authentication required.

Fortuitous Pirate

PubMed Scraper: Biomedical Articles & MeSH

themineworks/pubmed-ncbi-scraper

Scrape 36M+ PubMed/NCBI biomedical articles: title, abstract, authors, journal, PMID, DOI, MeSH terms. No API key needed. Build literature reviews & AI training corpora. Works in Claude, ChatGPT & any MCP agent.

The Mine Works

PubMed Daily Citations Tracker — by Query

v0iddo/pubmed-daily-citations-tracker

Pull daily-new PubMed citations matching any query. One clean row per article — PMID, title, abstract, authors, journal, year, DOI, MeSH terms, full Pubmed URL. Built for cron: pass `sinceDays:1` to get just yesterday's new articles. Source: NCBI E-utilities (free, no auth).

vøiddo

🧬 PubMed Scraper - Biomedical Literature & Citations

benthepythondev/pubmed-scraper

PubMed Scraper for the official NCBI PubMed API. Search 37M+ biomedical citations; extract title, authors, journal, publication date, DOI, PMID, article type and links. Supports PubMed field tags and sorting. For systematic reviews, medical research and bibliometrics. Keyless and fast.

Ben

PubMed Scraper

lulzasaur/pubmed-scraper

Search and scrape PubMed biomedical literature via NCBI E-utilities. Get titles, authors, abstracts, journals, MeSH terms, DOIs. Search by keyword or fetch by PMID.