Pricing

from $1.00 / 1,000 results

PubMed Search Scraper

Search PubMed (NCBI E-utilities) for biomedical articles by keyword, date range, and article type. Returns title, authors, journal, abstract, DOI, MeSH terms, keywords, and citation. Free public API, no proxy, no cookies. Optional NCBI API key for higher rate limits.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

Crawler Bros

Actor stats

Bookmarked

Total users

Monthly active users

3 months ago

Last modified

What it does

You provide one or more search terms (or paste full PubMed search URLs); the actor:

Builds a PubMed query for each term, optionally narrowed by date range, article type, and free-full-text flag.
Calls esearch to get the matching PMIDs (paginated).
Calls esummary in batches of 200 for the metadata (title, authors, journal, dates, DOI, publication types).
Calls efetch in batches of 50 for the abstract, MeSH headings, and author keywords.
Merges everything into one flat record per article, dedupes across search terms by PMID, and pushes to the dataset.

Empty fields are omitted (no nulls) — when an article has no abstract or no MeSH terms, those keys are simply absent.

Input

Field	Type	Default	Description
`searchTerms`	array of strings (required)	`["machine learning oncology"]`	One or more PubMed queries. Supports full PubMed syntax — boolean (`AND`/`OR`/`NOT`), MeSH (`cancer[MeSH]`), field tags, etc.
`searchUrls`	array of strings	`[]`	Optional. Paste full PubMed search URLs; the actor extracts the `term=` param and merges with `searchTerms`.
`pmidList`	array of strings	`[]`	Optional direct-lookup mode — list of PubMed IDs to fetch without searching (e.g. `["38123456", "36438426"]`). Bypasses `esearch` and goes straight to `esummary`/`efetch`. Combine with or use instead of `searchTerms`.
`maxItemsPerTerm`	integer	`25` (1–500)	Per-term result cap. Total dataset = sum across terms minus duplicates.
`dateFrom`	string	–	Earliest publication date, `YYYY/MM/DD` (e.g. `2024/01/01`). Optional.
`dateTo`	string	–	Latest publication date, `YYYY/MM/DD`. Optional.
`articleType`	enum	`any`	One of `any`, `review`, `clinical_trial`, `meta_analysis`, `case_report`, `randomized_controlled_trial`, `systematic_review`, `editorial`, `letter`, `comment`, `practice_guideline`, `observational_study`, `comparative_study`, `multicenter_study`.
`freeFullTextOnly`	boolean	`false`	If true, restrict to articles with free full-text access (PMC).
`language`	enum	`any`	Restrict by article language. One of `any`, `english`, `spanish`, `french`, `german`, `chinese`, `japanese`, `italian`, `portuguese`, `russian`, `korean`.
`journalFilter`	string	–	Restrict to a specific journal (exact name, e.g. `Nature` or `New England Journal of Medicine`). Optional.
`authorFilter`	string	–	Restrict to articles by a specific author in PubMed format (e.g. `Smith J` or `Smith JR`). Optional.
`meshFilter`	array of strings	`[]`	Restrict to articles tagged with these MeSH (Medical Subject Headings) terms — AND-joined (e.g. `["Lung Neoplasms", "Machine Learning"]` returns articles tagged with both).
`affiliationFilter`	string	–	Restrict to articles where any author's affiliation contains this substring (e.g. `Harvard`, `Mayo Clinic`, `Beijing`). Optional.
`includeCitedByCount`	boolean	`false`	Add a `citedByCount` field to each record via NCBI elink. Adds 1 elink call per page of PMIDs. Useful for citation-network analysis.
`apiKey`	string (Secret, optional)	–	Free NCBI API key. Raises rate limit from 3 → 10 req/s. Sign up at https://www.ncbi.nlm.nih.gov/account/. Useful for bulk runs.

Example input

{
  "searchTerms": ["machine learning oncology", "covid vaccine efficacy"],
  "maxItemsPerTerm": 50,
  "dateFrom": "2023/01/01",
  "dateTo": "2024/12/31",
  "articleType": "review",
  "freeFullTextOnly": true
}

Output

One record per unique article. Empty fields are omitted (no nulls).

{
  "pmid": "38123456",
  "title": "Machine learning for early lung-cancer detection: a systematic review",
  "authors": ["Smith J", "Doe JR", "Brown KL"],
  "authorsAbbreviated": ["Smith J", "Doe JR", "Brown KL"],
  "authorCount": 3,
  "journal": "Nature Reviews Oncology",
  "journalAbbrev": "Nat Rev Oncol",
  "publicationDate": "2024-03-15",
  "epubDate": "2024-02-20",
  "volume": "21",
  "issue": "5",
  "pages": "300-315",
  "issn": "1759-4774",
  "elocationId": "doi: 10.1000/foo",
  "language": "eng",
  "doi": "10.1000/foo",
  "pmcId": "PMC9876543",
  "pmcUrl": "https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9876543/",
  "abstract": "BACKGROUND: Lung cancer remains... METHODS: We searched MEDLINE...",
  "meshTerms": ["Lung Neoplasms", "Machine Learning", "Early Detection of Cancer"],
  "keywords": ["deep learning", "screening", "CT imaging"],
  "authorAffiliations": ["Harvard Medical School, Boston, USA.", "MIT, Cambridge, USA."],
  "conflictOfInterest": "The authors declare no conflict of interest.",
  "grants": [
    {"grantId": "R01-CA-12345", "agency": "NCI NIH HHS", "country": "United States"}
  ],
  "referenceCount": 86,
  "citedByCount": 8,
  "articleTypes": ["Journal Article", "Systematic Review"],
  "tags": ["Systematic Review", "Free article"],
  "articleUrl": "https://pubmed.ncbi.nlm.nih.gov/38123456/",
  "shareLinks": {
    "twitter": "https://twitter.com/intent/tweet?text=...",
    "facebook": "https://www.facebook.com/sharer/sharer.php?u=...",
    "permalink": "https://pubmed.ncbi.nlm.nih.gov/38123456/"
  },
  "citation": "Smith J, Doe JR, Brown KL. Machine learning for early lung-cancer detection: a systematic review. Nat Rev Oncol. 2024;21(5):300-315. doi:10.1000/foo",
  "inputQuery": "machine learning oncology",
  "scrapedAt": "2024-12-16T14:23:11+00:00"
}

Output fields

pmid — PubMed ID (stable, never changes).
title — article title (trailing period stripped).
authors / authorsAbbreviated / authorCount — full and abbreviated author lists + count.
journal / journalAbbrev — full journal name + abbreviated (NLM-style) name.
publicationDate — ISO date YYYY-MM-DD (best parse from PubMed's free-form pubdate).
epubDate — ISO date of electronic publication when available.
volume / issue / pages / issn / elocationId — bibliographic identifiers when present.
language — ISO 639-2 code (e.g. eng, spa, chi).
doi — Digital Object Identifier when registered.
pmcId — PubMed Central ID (e.g. PMC1234567) when the article is in PMC's open-access archive.
pmcUrl — direct URL to the free full-text version on PubMed Central (when pmcId is set).
abstract — full abstract text. Multi-section abstracts are flattened with section labels (e.g. "BACKGROUND: ... METHODS: ...").
meshTerms — array of MeSH descriptor names (curated medical subject headings).
keywords — author-supplied keywords (when available).
authorAffiliations — deduped list of institutional affiliations parsed from author metadata (e.g. ["Harvard Medical School, Boston, USA.", "MIT, Cambridge, USA."]).
conflictOfInterest — author-declared conflict-of-interest statement (when present).
grants — funding sources as [{grantId, agency, country}] rows (NIH grants, foundation grants, etc.).
referenceCount — number of references cited by this article (parsed from PubMed's reference list).
citedByCount — number of PubMed articles citing this article. Only populated when includeCitedByCount: true.
articleTypes — raw publication types from PubMed.
tags — derived human-readable tags from articleTypes (Review, Systematic Review, Meta-Analysis, Clinical Trial, Case Report, Randomized Controlled Trial) plus Free article when freeFullTextOnly is set.
articleUrl — direct link to the PubMed page.
shareLinks — {twitter, facebook, permalink} pre-filled share URLs.
citation — compact AMA-style citation string assembled from the metadata.
inputQuery — the search term (or extracted-from-URL term) that surfaced this article.
scrapedAt — ISO-8601 UTC timestamp.

Use cases

Literature reviews — build a structured corpus of every relevant paper for a systematic review or meta-analysis.
Research-trend tracking — monitor weekly/monthly volume of publications on a topic to spot rising fields.
Bibliometric analysis — pull thousands of records with structured authors/journals for citation-network analysis.
Author / journal mapping — find every paper by a specific researcher or in a target journal across a date range.
Curated newsletters — generate a fresh weekly list of new articles matching your topic + filter combination.

FAQ

Does it need a proxy or cookies? No. PubMed's E-utilities is fully public and works from datacenter IPs. No login or auth required.

Do I need an API key? No — the free public limit is 3 requests/second, plenty for default-sized runs. Supply an optional apiKey (free signup at https://www.ncbi.nlm.nih.gov/account/) to raise the limit to 10 req/s for bulk extraction.

Can I search by author, journal, or MeSH term? Yes — PubMed query syntax is fully supported. Examples:

Author: Smith J[Author]
Journal: "Nature Reviews Oncology"[Journal]
MeSH: Cancer[MeSH] AND machine learning
Combine: (cancer OR tumor) AND machine learning AND 2023:2024[dp]

Why is my abstract empty? ~5% of PubMed records ship without an abstract — usually editorials, letters, or older articles. The abstract field is simply omitted in those cases (omit-empty contract).

How does deduplication work? Across search terms within the same run, articles are deduped by PMID. Each article appears at most once in the dataset, with inputQuery set to the first term that surfaced it.

What if my query has zero results? You get a single sentinel record {type: "pubmed_scraper_error", reason: "no_results", searchTerms: [...]} so the dataset is non-empty. The run completes successfully — empty datasets aren't treated as failures.

Can I paste PubMed URLs instead of writing terms? Yes — drop full URLs like https://pubmed.ncbi.nlm.nih.gov/?term=cancer+immunotherapy into searchUrls. The actor extracts the term= param and treats it as a search term.

Is the data fresh? PubMed updates within hours of publication for indexed journals. The actor pulls live every run; results reflect PubMed's current state at fetch time.

Pubmed Research Scraper

fortuitous_pirate/pubmed-research-scraper

Search and extract biomedical research papers from PubMed (NCBI). Filter by keyword, journal, author, or date range. Returns paper title, authors, journal, publication date, DOI, and citation count. Free NCBI API — no authentication required.

Fortuitous Pirate

PubMed Scraper — Abstracts, Authors & MeSH Terms

logiover/pubmed-scraper

Scrape PubMed by keyword query or direct PMIDs. Extract title, abstract, authors, journal, DOI, MeSH terms, keywords, and publication date via NCBI E-utilities. No API key required.

Logiover

PubMed Biomedical Article Search

fit_melon/pubmed-biomedical-article-search

Search PubMed's 35M+ biomedical citations by keyword: title, authors, journal, year, DOI, PMID and abstract for each match. Official NCBI E-utilities API. Free — you only pay Apify usage.

D N

PubMed Search Scraper

automation-lab/pubmed-search-scraper

Search PubMed via the official NCBI API and extract article metadata, abstracts, DOI, authors, journals, MeSH terms, and keywords.

Stas Persiianenko

PubMed Scraper — Biomedical Research Papers

du7chmaniac/pubmed-scraper

Scrape biomedical research papers from PubMed via the NCBI E-utilities API. Search by keyword with optional date range, retrieve article metadata, abstracts, authors, MeSH terms, and DOIs. Supports both summary (fast) and full (with abstract) retrieval modes.

Joren Maurissen

PubMed Scraper — Papers, DOI & MeSH to JSON

devilscrapes/pubmed-papers-scraper

Search PubMed by query and export structured paper rows — title, authors, abstract, journal, DOI, PMID, MeSH terms, publication date — to JSON or CSV. A clean PubMed API wrapper that handles NCBI pagination, rate limits, and retries for research and ML pipelines.

DevilScrapes

PubMed Scraper: Biomedical Articles & MeSH

themineworks/pubmed-ncbi-scraper

Scrape 36M+ PubMed/NCBI biomedical articles: title, abstract, authors, journal, PMID, DOI, MeSH terms. No API key needed. Build literature reviews & AI training corpora. Works in Claude, ChatGPT & any MCP agent.

The Mine Works

PubMed Scraper

lulzasaur/pubmed-scraper

Search and scrape PubMed biomedical literature via NCBI E-utilities. Get titles, authors, abstracts, journals, MeSH terms, DOIs. Search by keyword or fetch by PMID.

lulz bot

PubMed Daily Citations Tracker — by Query

v0iddo/pubmed-daily-citations-tracker

Pull daily-new PubMed citations matching any query. One clean row per article — PMID, title, abstract, authors, journal, year, DOI, MeSH terms, full Pubmed URL. Built for cron: pass `sinceDays:1` to get just yesterday's new articles. Source: NCBI E-utilities (free, no auth).

vøiddo

PubMed Biomedical Literature Scraper

meticulous_sweetwilliam/pubmed-biomedical-literature

Query PubMed via NCBI API for biomedical papers. Extract title, authors, abstract, MeSH terms, DOI, PMID. For pharma R&D, biotech, medical AI pipelines, and systematic reviews.