PubMed Biomedical Literature Search
Pricing
from $2.00 / 1,000 paper fetcheds
PubMed Biomedical Literature Search
Search and extract data from 37M+ PubMed biomedical research articles. Filter by keyword, author, journal, MeSH terms, date range, and article type. Get structured JSON with titles, authors, DOIs, PMC IDs, journals, and PubMed URLs. Free NCBI API.
Pricing
from $2.00 / 1,000 paper fetcheds
Rating
0.0
(0)
Developer

ryan clinton
Actor stats
1
Bookmarked
5
Total users
1
Monthly active users
2 days ago
Last modified
Categories
Share
Search and extract structured metadata from PubMed, the world's largest biomedical literature database with over 37 million citations. Query by keyword, author, journal, date range, and article type using the NCBI E-utilities API. Returns clean JSON with titles, authors, DOIs, PMC IDs, journal details, and direct PubMed links -- ready for systematic reviews, bibliometric analysis, and research monitoring.
What does PubMed Biomedical Literature Search do?
PubMed Biomedical Literature Search is an Apify actor that queries the NCBI E-utilities API to find and extract structured article metadata from PubMed. It takes your search criteria -- keywords with boolean operators, author names, journal titles, publication date ranges, and article types -- and returns detailed metadata for every matching article as clean, structured JSON.
For each article, the actor extracts 17 fields: PMID, title, full author list (both as a formatted string and as an array), last author, journal name and abbreviation, publication date, volume, issue, pages, DOI, PubMed Central ID, languages, article types, a direct PubMed URL, and an extraction timestamp.
The actor handles PubMed's two-step API process automatically. First it searches for matching article IDs, then it fetches full metadata in batched requests with built-in rate limiting. You get structured results without needing to understand the NCBI API architecture.
Use cases include:
- Systematic literature reviews -- Collect and screen article metadata across a research topic.
- Researcher profiling -- Compile a complete publication list for a specific author.
- Research monitoring -- Schedule weekly runs to track new publications in your field.
- Bibliometric analysis -- Analyze publication patterns across journals, dates, and article types.
- Grant and funding research -- Identify published work related to funding areas or clinical domains.
Why use PubMed Biomedical Literature Search on Apify?
- No API key required -- PubMed's NCBI E-utilities are completely free and open. The actor handles rate limiting and batching for you.
- Structured, consistent output -- Every article returns the same 17 fields as clean JSON, ready for spreadsheets, databases, or analysis pipelines.
- Scheduled monitoring -- Set up recurring searches on Apify to automatically track new publications daily or weekly.
- Scalable retrieval -- Fetch up to 500 articles per run with automatic batch processing and NCBI rate-limit compliance.
- Integration-ready -- Export results to Google Sheets, Slack, webhooks, Zapier, Make, or any of Apify's built-in integrations.
- Lightweight and fast -- API-only actor using 256 MB of memory. Most runs complete in under 30 seconds.
- Programmatic access -- Call via the Apify API from Python, JavaScript, or cURL to embed PubMed search into your own applications.
Key features
- Search PubMed's 37M+ biomedical article database using keywords, boolean operators, and MeSH terms
- Filter by author name, journal title, publication date range, and article type
- Six article type filters: Review, Clinical Trial, Randomized Controlled Trial, Meta-Analysis, Systematic Review, and Case Reports
- Sort results by relevance or publication date (newest first)
- Automatic extraction of DOI and PubMed Central (PMC) identifiers from article ID arrays
- Full author list with last author highlighted separately
- Direct PubMed URL generated for each article
- Batched retrieval with 200 PMIDs per request for efficient API usage
- Built-in 350ms delay between requests to comply with NCBI rate limits (3 requests/second)
- Run log summary with top journals, article type distribution, and DOI coverage statistics
How to use PubMed Biomedical Literature Search
- Go to the PubMed Biomedical Literature Search actor page on Apify and click Start.
- Enter your search criteria in the input form. Provide at least one of: search query, author name, or journal name.
- Optionally configure filters for article type, date range, sort order, and maximum results.
- Click Run and wait for the actor to complete (typically 5-30 seconds).
- View, download, or export your results from the Dataset tab in JSON, CSV, Excel, or other formats.
Input parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
query | string | "CRISPR gene therapy" | PubMed search query. Supports boolean operators (AND, OR, NOT) and field tags like [Title], [Author], [MeSH Terms]. |
author | string | -- | Author name filter (e.g., "Doudna JA", "Zhang F"). Uses the [Author] field tag. |
journal | string | -- | Journal name filter (e.g., "Nature", "NEJM", "Lancet"). Uses the [Journal] field tag. |
dateFrom | string | -- | Publication date start in YYYY/MM/DD or YYYY format. |
dateTo | string | -- | Publication date end in YYYY/MM/DD or YYYY format. |
articleType | select | -- | Filter by publication type: Review, Clinical Trial, Randomized Controlled Trial, Meta-Analysis, Systematic Review, or Case Reports. |
sortBy | select | relevance | Sort order: relevance (default) or pub_date (newest first). |
maxResults | integer | 50 | Maximum number of articles to return. Range: 1-500. |
At least one of query, author, or journal must be provided.
Input examples
Basic keyword search:
{"query": "CRISPR gene therapy","maxResults": 50}
Author-specific search with date range:
{"query": "gene editing","author": "Doudna JA","dateFrom": "2020","dateTo": "2025","sortBy": "pub_date","maxResults": 100}
Systematic review search filtered by article type:
{"query": "COVID-19 vaccine efficacy","articleType": "Meta-Analysis","dateFrom": "2023/01/01","sortBy": "pub_date","maxResults": 200}
Journal-specific search with complex boolean query:
{"query": "machine learning AND (radiology OR imaging) NOT review[Publication Type]","journal": "Nature Medicine","dateFrom": "2024","sortBy": "relevance","maxResults": 50}
Tips for best results
- Use MeSH terms for precise medical subject searches. For example,
"diabetes mellitus, type 2"[MeSH Terms]returns more accurate results than just "type 2 diabetes". MeSH is PubMed's controlled vocabulary thesaurus. - Use
[Title/Abstract]field tags to restrict keyword matches. For example,CRISPR[Title/Abstract]only matches articles with "CRISPR" in the title or abstract, not in MeSH terms or other metadata. - Combine boolean operators in the query field:
CRISPR AND (cancer OR oncology) NOT review[Publication Type]builds precise searches without needing separate filter fields. - Use
[MeSH Major Topic]to find articles where a concept is the primary subject:"neoplasms"[MeSH Major Topic]returns only articles focused on cancer, not those that mention it tangentially. - Set date ranges with
YYYYorYYYY/MM/DDformat. Using just a year (e.g.,"2024") covers the entire year. Using"2024/06/01"to"2024/12/31"narrows to a specific window. - Start with a small
maxResults(25-50) to preview result quality before scaling up to 500. - Sort by
pub_datewhen monitoring new research or building a chronological bibliography. Sort byrelevancewhen exploring a topic for the first time. - Schedule recurring runs on Apify with a date range filter set to recent weeks to automatically track new publications in your area.
- Author names follow the PubMed format: last name followed by initials with no periods (e.g.,
"Doudna JA", not"Jennifer A. Doudna").
Programmatic access
You can call this actor programmatically using the Apify API. Here are examples in Python, JavaScript, and cURL.
Python:
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("ryanclinton/pubmed-research-search").call(run_input={"query": "CRISPR gene therapy","articleType": "Meta-Analysis","dateFrom": "2023","sortBy": "pub_date","maxResults": 100,})for article in client.dataset(run["defaultDatasetId"]).iterate_items():print(f"{article['pmid']} | {article['title'][:80]} | {article['doi']}")
JavaScript:
import { ApifyClient } from "apify-client";const client = new ApifyClient({ token: "YOUR_API_TOKEN" });const run = await client.actor("ryanclinton/pubmed-research-search").call({query: "CRISPR gene therapy",articleType: "Meta-Analysis",dateFrom: "2023",sortBy: "pub_date",maxResults: 100,});const { items } = await client.dataset(run.defaultDatasetId).listItems();items.forEach((article) => {console.log(`${article.pmid} | ${article.title.slice(0, 80)} | ${article.doi}`);});
cURL:
curl "https://api.apify.com/v2/acts/ryanclinton~pubmed-research-search/runs" \-X POST \-H "Content-Type: application/json" \-H "Authorization: Bearer YOUR_API_TOKEN" \-d '{"query": "CRISPR gene therapy","articleType": "Meta-Analysis","dateFrom": "2023","sortBy": "pub_date","maxResults": 100}'
Output example
Each article in the output dataset contains 17 fields:
{"pmid": "39127584","title": "CRISPR-Cas9 gene editing for sickle cell disease and beta-thalassemia: a systematic review and meta-analysis of clinical outcomes.","authors": "Martinez RL, Chen W, Patel S, Johnson KA, Williams DR","authorList": ["Martinez RL","Chen W","Patel S","Johnson KA","Williams DR"],"lastAuthor": "Williams DR","journal": "Blood advances","journalAbbrev": "Blood Adv","pubDate": "2024 Sep 15","volume": "8","issue": "18","pages": "4821-4835","doi": "10.1182/bloodadvances.2024013847","pmc": "PMC11432987","languages": ["eng"],"articleType": ["Journal Article", "Meta-Analysis", "Systematic Review"],"pubmedUrl": "https://pubmed.ncbi.nlm.nih.gov/39127584/","extractedAt": "2025-01-15T10:30:45.123Z"}
Output fields reference
| Field | Type | Description |
|---|---|---|
pmid | string | PubMed unique identifier for the article. |
title | string | Full article title. |
authors | string | Comma-separated author names (e.g., "Martinez RL, Chen W, Patel S"). |
authorList | string[] | Array of individual author names for programmatic access. |
lastAuthor | string | Last (senior) author name. |
journal | string | Full journal name (e.g., "Blood advances"). |
journalAbbrev | string | Abbreviated journal name (e.g., "Blood Adv"). |
pubDate | string | Publication date as formatted by PubMed (e.g., "2024 Sep 15"). |
volume | string or null | Journal volume number. Null if not available. |
issue | string or null | Journal issue number. Null if not available. |
pages | string or null | Page range (e.g., "4821-4835"). Null if not available. |
doi | string or null | Digital Object Identifier. Extracted from the articleids array. Null if not assigned. |
pmc | string or null | PubMed Central ID (e.g., "PMC11432987"). Indicates free full-text availability. Null if not in PMC. |
languages | string[] | Array of language codes (e.g., ["eng"]). |
articleType | string[] | Array of publication types (e.g., ["Journal Article", "Meta-Analysis"]). |
pubmedUrl | string | Direct URL to the article on PubMed (e.g., "https://pubmed.ncbi.nlm.nih.gov/39127584/"). |
extractedAt | string | ISO 8601 timestamp of when the data was extracted. |
How it works
The actor follows a two-step process that mirrors how the NCBI E-utilities API is designed. PubMed does not offer a single endpoint that returns full metadata -- you must first search for article IDs, then fetch their details separately.
PubMed Biomedical Literature Search===================================INPUT STEP 1: ESearch STEP 2: ESummary----- ---------------- ----------------query ──┐ Batch 1 (PMIDs 1-200)author ─┤ buildQuery() ┌──────────────┐ PMIDs ┌──────────────┐journal ┼─────────────────>│ esearch.fcgi │──────────>│ esummary.fcgi│──┐dates ──┤ join(" AND ") │ │ (up to └──────────────┘ │type ───┘ │ Returns: │ 500) ││ - count │ Batch 2 (201-400) ││ - PMID list │ ┌──────────────┐ │└──────────────┘ ────────>│ esummary.fcgi│──┤350ms └──────────────┘ │delay │between Batch 3 (401-500) │batches ┌──────────────┐ │────────>│ esummary.fcgi│──┤└──────────────┘ ││TRANSFORM OUTPUT │--------- ------ ││For each article: <────────────────────────────┘- Extract DOI from articleids array- Extract PMC from articleids array- Flatten authors to string + array- Build pubmedUrl from PMID- Add extractedAt timestamp│v┌──────────────────┐│ Apify Dataset ││ (JSON/CSV/Excel) │└──────────────────┘
Step 1: ESearch -- finding article IDs
The actor constructs a PubMed query by combining your input parameters with AND operators and field tags:
- Keywords go directly into the query string
- Author names get the
[Author]field tag - Journal names get
"journal"[Journal]with quotes - Article types get
"type"[Publication Type] - Date ranges use
"YYYY/MM/DD"[Date - Publication] : "YYYY/MM/DD"[Date - Publication]syntax
This query is sent to esearch.fcgi with retmode=json and retmax set to your maxResults value. ESearch returns only a list of PMIDs (PubMed identifiers) and a total count of matching articles. It does not return any article metadata.
The sort parameter is encoded as relevance or pub+date (with a + character for publication date sorting).
Step 2: ESummary -- fetching article metadata
The PMIDs from Step 1 are split into batches of 200 (the NCBI recommended batch size). Each batch is sent as a comma-separated list to esummary.fcgi, which returns full metadata for each article.
The actor waits 350ms between batch requests to stay within NCBI's rate limit of 3 requests per second. For 500 articles, this means 3 batches with two 350ms delays -- typically under 5 seconds of total wait time.
DOI and PMC extraction
PubMed stores article identifiers in an articleids array with typed entries. The actor searches this array for entries with idtype === 'doi' and idtype === 'pmc' to extract these values. Not all articles have DOIs or PMC IDs -- the fields are set to null when unavailable.
Author flattening
ESummary returns authors as an array of objects with name and authtype fields. The actor extracts the name from each entry and provides both a comma-separated string (authors) for display and an array (authorList) for programmatic access. The lastAuthor field is taken directly from the ESummary response.
Run summary
After processing all articles, the actor logs summary statistics: total articles output, top 5 journals by article count, article type distribution, and the number of articles with DOIs.
How much does it cost to run?
PubMed Biomedical Literature Search is an API-only actor that uses minimal compute resources. There is no browser rendering or heavy crawling involved.
| Scenario | Articles | Est. Time | Est. Cost (Apify) |
|---|---|---|---|
| Quick search | 50 | 5-10 sec | ~$0.001 |
| Medium batch | 200 | 15-30 sec | ~$0.003 |
| Full extraction | 500 | 30-90 sec | ~$0.005 |
Notes:
- The NCBI PubMed API is completely free with no API key required.
- The actor runs on 256 MB of memory (the minimum Apify allocation).
- Cost estimates are based on Apify platform compute pricing. Actual costs depend on your Apify plan.
- The only variable affecting run time is the number of ESummary batches (one per 200 PMIDs) and the 350ms delay between them.
Limitations and responsible use
- No abstracts in output. The NCBI ESummary endpoint does not return article abstracts. The actor outputs article metadata only (title, authors, journal, DOI, etc.). To get abstracts, you would need to use the EFetch endpoint with XML parsing, which is not included in this actor. For most bibliometric and screening workflows, the metadata fields are sufficient.
- Maximum 500 articles per run. This is a practical limit to keep runs fast and within reasonable API usage. If your search matches more articles, narrow your filters or run multiple targeted searches.
- NCBI rate limit: 3 requests per second. The actor enforces a 350ms delay between batch requests. Without a registered NCBI API key, the rate limit is 3 requests per second. The actor stays within this limit.
- No full-text content. The actor returns metadata and links, not the full text of articles. Use the
pubmedUrlordoifields to access full-text content through your institution or open-access sources. - Date format matters. Use
YYYY/MM/DDorYYYYformat for date parameters. Other formats may produce unexpected results. - PubMed indexing lag. While PubMed is updated daily, very recent articles may take a few days to appear in search results after publication.
- Respect NCBI Terms of Service. Do not run this actor at extremely high frequency. NCBI reserves the right to block IP addresses that make excessive requests. For high-volume use cases, consider registering for an NCBI API key at https://www.ncbi.nlm.nih.gov/account/.
FAQ
Why does the output not include article abstracts?
The actor uses NCBI's ESummary endpoint, which returns article metadata but not abstracts. This is a limitation of the ESummary API response format. Retrieving abstracts requires the EFetch endpoint with XML response parsing, which adds significant complexity and processing time. For most use cases -- systematic review screening, bibliometric analysis, publication tracking -- the 17 metadata fields provided are sufficient. You can use the pubmedUrl or doi fields to access full abstracts on the PubMed website.
Do I need an NCBI API key? No. The NCBI E-utilities API is free and does not require authentication. The actor operates within NCBI's default rate limit of 3 requests per second. If you need higher throughput for very frequent scheduled runs, you can register for a free NCBI API key at https://www.ncbi.nlm.nih.gov/account/, but the actor does not currently accept a key as an input parameter.
How current is the PubMed data? PubMed is updated daily by NCBI. When you run this actor, it queries the live PubMed database in real time. New articles typically appear in PubMed within 1-5 days of publication, depending on the journal and indexing speed.
Can I search using MeSH terms?
Yes. Include MeSH terms directly in the query field using PubMed syntax: "breast neoplasms"[MeSH Terms]. You can also use [MeSH Major Topic] to find articles where the term is the primary subject. MeSH terms provide more precise results than keyword searches because they use NCBI's controlled vocabulary.
What happens if my search matches more than 500 articles?
The actor returns up to maxResults articles (maximum 500). The run log shows the total number of matching articles in PubMed. If your search matches thousands of results, consider narrowing your query with additional filters (date range, journal, article type) or running multiple focused searches.
Can I search for articles in languages other than English?
Yes. PubMed indexes articles in many languages. Use the english[Language] field tag in your query to restrict results to English, or omit it to retrieve articles in all languages. Each output record includes a languages field indicating the article's language(s).
Related actors
| Actor | Description | Best for |
|---|---|---|
| OpenAlex Research Search | Search OpenAlex for open-access research papers with institutional affiliation data. | Open-access papers, citation counts, institutional analysis |
| Crossref Academic Paper Search | Search Crossref for academic papers with DOI resolution and citation metadata. | DOI lookups, citation metadata, publisher data |
| Semantic Scholar Paper Search | Search Semantic Scholar for AI-enriched academic papers with citation graphs. | Citation graphs, influential citations, AI-ranked papers |
| ArXiv Preprint Paper Search | Search ArXiv for preprint papers in physics, math, CS, and related fields. | Preprints, physics/math/CS research, open access |
| Europe PMC Literature Search | Search Europe PMC for European biomedical literature with full-text access links. | European biomedical research, full-text links, grant data |
| CORE Open Access Papers | Search CORE for open-access academic papers aggregated from repositories worldwide. | Open-access full text, repository aggregation, global coverage |