PubMed Papers Scraper
Pricing
Pay per event
PubMed Papers Scraper
Search PubMed by query and get structured paper metadata — title, authors, abstract, journal, DOI, PubMed ID, publication date, MeSH terms. We handle NCBI's pagination, rate limits, and transient errors. Typed rows ready for a research dashboard.
Pricing
Pay per event
Rating
0.0
(0)
Developer
DevilScrapes
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
4 days ago
Last modified
Categories
Share
🎯 What this scrapes
NCBI's E-utilities is the canonical entry point to PubMed — and it punishes naive scrapers with hard rate limits and intermittent 500s. This Actor turns a free-form search into a typed dataset (title, authors, abstract, journal, DOI, PMID, publication date, MeSH terms, full citation) and absorbs the upstream's friction for you: paged fetches, backoff on 429s, transient-error retries. Attach your NCBI API key to lift throughput from 3 req/s to 10 req/s; either way the dataset comes out the same.
🔥 What we handle for you
- 🛡️ Browser fingerprint rotation —
curl-cffiimpersonates real Chrome / Firefox / Safari TLS handshakes so the target sees a browser, not Python. - 🌐 Residential proxy rotation via Apify Proxy — fresh session and exit IP on every block.
- 🔁 Retries with exponential backoff on
408 / 429 / 5xx— up to 5 attempts per page,Retry-Afterhonoured. - 🧱 Rate-limit-aware pacing — when the target pushes back, we slow down instead of getting banned.
- 🧊 Clean, typed dataset rows — Pydantic-validated, ISO-8601 timestamps, stable IDs, JSON / CSV / Excel export straight from the Apify Console.
- 💰 Pay-Per-Event pricing — you only pay for results that hit your dataset. No data, no charge.
💡 Use cases
- Literature reviews — pull every paper matching a topic + date range to seed a meta-analysis.
- Pharma intel — track new mentions of a drug name across PubMed.
- Author publication monitoring — daily diff for
[Author]to feed personal RSS-style alerts. - MeSH-based corpus assembly — extract every paper tagged with a specific MeSH heading.
⚙️ How to use it
- Click Try for free at the top of the page.
- Fill in the input form — most fields have sensible defaults.
- Click Start. Output streams into the run's dataset.
- Export from Storage → Dataset as JSON, CSV, or Excel — or fetch via the API.
📥 Input
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
searchQuery | string | yes | 'diabetes mellitus type 2 review' | PubMed-style query. Field tags like [Author], [Title], [MeSH] are supported. Exam |
apiKey | string | no | '—' | Get one from https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/. Lifts rate limit from 3 |
maxResults | integer | no | 30 | Total PubMed records to fetch. |
sortBy | string | no | 'most_recent' | Field used to order results. |
proxyConfiguration | object | no | {'useApifyProxy': False} | NCBI does not IP-block under normal use. Proxy is optional. |
Example input
{"searchQuery": "crispr review 2024","maxResults": 3,"proxyConfiguration": {"useApifyProxy": false}}
📤 Output
Every row is one dataset item.
| Field | Type | Notes |
|---|---|---|
pmid | string | PubMed ID (the unique identifier). |
pmcid | ['string', 'null'] | PubMed Central ID when available. |
doi | ['string', 'null'] | Digital Object Identifier. |
title | string | Paper title. |
abstract | ['string', 'null'] | Abstract text (may include structured headings). |
authors | array | Author names (Last F format) preserving order. |
journal | ['string', 'null'] | Journal full name. |
journal_iso | ['string', 'null'] | Journal ISO abbreviation. |
publication_types | array | Publication-type labels (e.g. 'Review', 'Journal Article'). |
mesh_terms | array | MeSH headings assigned by indexers. |
keywords | array | Author-supplied keywords. |
pub_date | ['string', 'null'] | Best-available publication date (ISO-8601 or year only). |
pubmed_url | string | PubMed canonical URL. |
scraped_at | string | When this row was recorded. |
Example output
{"pmid": "39000123","doi": "10.1234/example.doi","title": "Advances in CRISPR-Cas12a therapeutics \u2014 a review","journal_iso": "Nat Rev Drug Discov","pub_date": "2026-03-01"}
💰 Pricing
Pay-Per-Event — you pay only when these events fire:
| Event | USD | What it is |
|---|---|---|
actor-start | $0.005 | One-off warm-up charge per run |
result | $0.002 | Per dataset item |
Example: 1 000 results at the rates above ≈ $2.00. No subscription, no minimum, no card to start — Apify gives every new account $5 of free credit.
🚧 Limitations
We hit only E-utilities (esearch + efetch). Citation graphs (which papers cite which) are not in scope. Some fields (especially older records) may be missing — the Actor surfaces null rather than fabricating data.
❓ FAQ
Is an NCBI API key required?
No — without one you get ~3 req/s, plenty for normal use. With one you lift to 10 req/s.
Why are some abstracts empty?
PubMed doesn't always have abstract text — older records and some letters are abstract-less.
Can I filter by date?
Use the [PDat] qualifier in your query (e.g. 2024[PDat]). NCBI's syntax is well documented.
What about full text?
Out of scope — full text lives on the publisher's site. We give you PMCID when the paper is in PubMed Central.
💬 Your feedback
Spotted a bug, hit a weird edge case, or need a new field? Open an issue on the Actor's Issues tab on Apify Console — we ship fixes weekly and we read every report.