PubMed Papers Scraper avatar

PubMed Papers Scraper

Pricing

Pay per event

Go to Apify Store
PubMed Papers Scraper

PubMed Papers Scraper

Search PubMed by query and get structured paper metadata — title, authors, abstract, journal, DOI, PubMed ID, publication date, MeSH terms. We handle NCBI's pagination, rate limits, and transient errors. Typed rows ready for a research dashboard.

Pricing

Pay per event

Rating

0.0

(0)

Developer

DevilScrapes

DevilScrapes

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

4 days ago

Last modified

Categories

Share


🎯 What this scrapes

NCBI's E-utilities is the canonical entry point to PubMed — and it punishes naive scrapers with hard rate limits and intermittent 500s. This Actor turns a free-form search into a typed dataset (title, authors, abstract, journal, DOI, PMID, publication date, MeSH terms, full citation) and absorbs the upstream's friction for you: paged fetches, backoff on 429s, transient-error retries. Attach your NCBI API key to lift throughput from 3 req/s to 10 req/s; either way the dataset comes out the same.

🔥 What we handle for you

  • 🛡️ Browser fingerprint rotationcurl-cffi impersonates real Chrome / Firefox / Safari TLS handshakes so the target sees a browser, not Python.
  • 🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP on every block.
  • 🔁 Retries with exponential backoff on 408 / 429 / 5xx — up to 5 attempts per page, Retry-After honoured.
  • 🧱 Rate-limit-aware pacing — when the target pushes back, we slow down instead of getting banned.
  • 🧊 Clean, typed dataset rows — Pydantic-validated, ISO-8601 timestamps, stable IDs, JSON / CSV / Excel export straight from the Apify Console.
  • 💰 Pay-Per-Event pricing — you only pay for results that hit your dataset. No data, no charge.

💡 Use cases

  • Literature reviews — pull every paper matching a topic + date range to seed a meta-analysis.
  • Pharma intel — track new mentions of a drug name across PubMed.
  • Author publication monitoring — daily diff for [Author] to feed personal RSS-style alerts.
  • MeSH-based corpus assembly — extract every paper tagged with a specific MeSH heading.

⚙️ How to use it

  1. Click Try for free at the top of the page.
  2. Fill in the input form — most fields have sensible defaults.
  3. Click Start. Output streams into the run's dataset.
  4. Export from Storage → Dataset as JSON, CSV, or Excel — or fetch via the API.

📥 Input

FieldTypeRequiredDefaultNotes
searchQuerystringyes'diabetes mellitus type 2 review'PubMed-style query. Field tags like [Author], [Title], [MeSH] are supported. Exam
apiKeystringno'—'Get one from https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/. Lifts rate limit from 3
maxResultsintegerno30Total PubMed records to fetch.
sortBystringno'most_recent'Field used to order results.
proxyConfigurationobjectno{'useApifyProxy': False}NCBI does not IP-block under normal use. Proxy is optional.

Example input

{
"searchQuery": "crispr review 2024",
"maxResults": 3,
"proxyConfiguration": {
"useApifyProxy": false
}
}

📤 Output

Every row is one dataset item.

FieldTypeNotes
pmidstringPubMed ID (the unique identifier).
pmcid['string', 'null']PubMed Central ID when available.
doi['string', 'null']Digital Object Identifier.
titlestringPaper title.
abstract['string', 'null']Abstract text (may include structured headings).
authorsarrayAuthor names (Last F format) preserving order.
journal['string', 'null']Journal full name.
journal_iso['string', 'null']Journal ISO abbreviation.
publication_typesarrayPublication-type labels (e.g. 'Review', 'Journal Article').
mesh_termsarrayMeSH headings assigned by indexers.
keywordsarrayAuthor-supplied keywords.
pub_date['string', 'null']Best-available publication date (ISO-8601 or year only).
pubmed_urlstringPubMed canonical URL.
scraped_atstringWhen this row was recorded.

Example output

{
"pmid": "39000123",
"doi": "10.1234/example.doi",
"title": "Advances in CRISPR-Cas12a therapeutics \u2014 a review",
"journal_iso": "Nat Rev Drug Discov",
"pub_date": "2026-03-01"
}

💰 Pricing

Pay-Per-Event — you pay only when these events fire:

EventUSDWhat it is
actor-start$0.005One-off warm-up charge per run
result$0.002Per dataset item

Example: 1 000 results at the rates above ≈ $2.00. No subscription, no minimum, no card to start — Apify gives every new account $5 of free credit.

🚧 Limitations

We hit only E-utilities (esearch + efetch). Citation graphs (which papers cite which) are not in scope. Some fields (especially older records) may be missing — the Actor surfaces null rather than fabricating data.

❓ FAQ

Is an NCBI API key required?

No — without one you get ~3 req/s, plenty for normal use. With one you lift to 10 req/s.

Why are some abstracts empty?

PubMed doesn't always have abstract text — older records and some letters are abstract-less.

Can I filter by date?

Use the [PDat] qualifier in your query (e.g. 2024[PDat]). NCBI's syntax is well documented.

What about full text?

Out of scope — full text lives on the publisher's site. We give you PMCID when the paper is in PubMed Central.

💬 Your feedback

Spotted a bug, hit a weird edge case, or need a new field? Open an issue on the Actor's Issues tab on Apify Console — we ship fixes weekly and we read every report.