Pricing

from $2.00 / 1,000 paper fetcheds

Go to Apify Store

Europe PMC Biomedical Literature Search

Try for free

Pricing

from $2.00 / 1,000 paper fetcheds

Rating

0.0

(0)

Developer

Ryan Clinton

Actor stats

Bookmarked

Total users

Monthly active users

12 days ago

Last modified

Europe PMC Literature Search

Search and extract biomedical and life science publications from Europe PMC -- a free, comprehensive repository of over 40 million articles aggregated from PubMed, PubMed Central (PMC), and preprint servers including bioRxiv and medRxiv. Filter by author, journal, date range, open access status, and source database. Returns structured citation data with abstracts, MeSH terms, full-text URLs, and citation counts. No API key required.

Why use Europe PMC Literature Search?

Access 40M+ publications in one search -- Europe PMC unifies PubMed (MED), PMC full-text (PMC), and preprints (PPR) into a single searchable index, eliminating the need to query multiple databases separately.
No API key or authentication needed -- the Europe PMC REST API is completely free and open, so you can start extracting data immediately without registration or credentials.
Rich structured metadata -- every result includes PMID, PMCID, DOI, full author lists with affiliations, abstract text, MeSH subject headings, citation counts, publication types, and direct full-text URLs.
Automated pagination and data transformation -- the actor handles cursor-based pagination, nested API response parsing, and output normalization so you get clean, flat JSON records ready for analysis.
Schedule recurring literature monitoring -- run the actor daily or weekly with date range filters to automatically track new publications on any biomedical topic.
Export anywhere -- results are stored in standard Apify datasets that export to JSON, CSV, Excel, Google Sheets, or feed directly into downstream workflows via webhooks and the Apify API.

Key features

Lucene-style query syntax -- supports free-text search plus field-specific operators like TITLE:"term", AUTH:"name", DOI:10.xxx, and Boolean combinations with AND/OR/NOT
Author filtering -- narrow results to a specific researcher by name using the dedicated author filter field
Journal filtering -- restrict searches to publications from a specific journal title
Date range filtering -- specify start and end dates in YYYY-MM-DD format to target a publication window
Open access filtering -- toggle a single checkbox to return only freely available open access publications
Source database selection -- choose between All sources, PubMed (MED) for MEDLINE citations, PMC for full-text articles, or Preprints (PPR) for bioRxiv/medRxiv content
Flexible sort options -- sort results by relevance, citation count (most cited first), or publication date (most recent first)
Full-text URL extraction -- automatically finds the best available full-text link for each article, preferring HTML over PDF over any other format
MeSH term extraction -- returns Medical Subject Headings for each article, enabling standardized topic classification and filtering
Up to 500 results per run -- cursor-based pagination collects large result sets efficiently with page sizes up to 1,000 per API call

How to use Europe PMC Literature Search

Using the Apify Console

Go to the Europe PMC Literature Search actor page on Apify.
Click Start to open the input configuration form.
Enter your search query in the Search Query field (e.g., CRISPR gene editing).
Optionally fill in Author Name, Journal Name, Date From, Date To, Open Access Only, and Source Database filters.
Select your preferred Sort By option -- Relevance, Most Cited, or Most Recent.
Set the Max Results value (1 to 500, default is 50).
Click Start to run the actor.
When the run finishes, open the Dataset tab to view, download, or export results in JSON, CSV, or Excel format.

Using the Apify API or CLI

apify call ryanclinton/europe-pmc-search \
  --input='{"query":"CRISPR gene editing","openAccessOnly":true,"sortBy":"CITED desc","maxResults":100}'

Input parameters

Parameter	Type	Required	Default	Description
`query`	String	Yes	--	Search query. Supports free text and field syntax like `TITLE:"term"`, `AUTH:"name"`, `DOI:10.xxx`
`author`	String	No	--	Filter by author name (e.g., `"Smith J"`)
`journal`	String	No	--	Filter by journal name (e.g., `"Nature"`)
`dateFrom`	String	No	--	Start date in YYYY-MM-DD format
`dateTo`	String	No	--	End date in YYYY-MM-DD format
`openAccessOnly`	Boolean	No	`false`	Only return open access publications
`source`	String	No	All	Source database: All, PubMed (`MED`), PMC Full Text (`PMC`), or Preprints (`PPR`)
`sortBy`	String	No	`RELEVANCE`	Sort order: `RELEVANCE`, `CITED desc` (most cited), or `P_PDATE_D desc` (most recent)
`maxResults`	Integer	No	`50`	Maximum number of results to return (1--500)

Example input

{
    "query": "machine learning drug discovery",
    "author": "Zhang",
    "journal": "Nature",
    "dateFrom": "2023-01-01",
    "dateTo": "2025-12-31",
    "openAccessOnly": true,
    "source": "MED",
    "sortBy": "CITED desc",
    "maxResults": 100
}

Tips for effective queries

Combine free text with field operators for precision: TITLE:"deep learning" AND AUTH:"Chen".
Use the dedicated author and journal filter fields instead of embedding them in the query string -- the actor builds the correct Lucene syntax for you.
Set dateFrom to a recent date and schedule recurring runs to build an automated new-publication alert pipeline.
Filter by source PMC when you need articles with guaranteed full-text availability.
Filter by source PPR to find preprints from bioRxiv and medRxiv before they are formally published.

Output

Each item in the output dataset contains 23 fields with full publication metadata.

Example output

{
    "pmid": "37648796",
    "pmcid": "PMC10564893",
    "doi": "10.1038/s41586-023-06468-x",
    "title": "Base editing of haematopoietic stem cells rescues sickle cell disease in mice",
    "authorString": "Newby GA, Yen JS, Woodard KJ, Mayuranathan T, Lazzarotto CR, Li Y...",
    "authors": [
        {
            "fullName": "Newby GA",
            "firstName": "Gregory A",
            "lastName": "Newby",
            "affiliation": "Merkin Institute, Broad Institute of Harvard and MIT, Cambridge, MA"
        },
        {
            "fullName": "Yen JS",
            "firstName": "Jonathan S",
            "lastName": "Yen"
        }
    ],
    "journalTitle": "Nature",
    "journalVolume": "623",
    "journalIssue": "7985",
    "pageInfo": "295-302",
    "pubYear": "2023",
    "firstPublicationDate": "2023-08-30",
    "abstractText": "Sickle cell disease (SCD) is caused by a point mutation in the beta-globin gene...",
    "citedByCount": 147,
    "isOpenAccess": true,
    "inPMC": true,
    "inEPMC": true,
    "source": "MED",
    "pubType": ["research-article", "Journal Article"],
    "meshTerms": ["CRISPR-Cas Systems", "Sickle Cell Disease", "Hematopoietic Stem Cells"],
    "fullTextUrl": "https://europepmc.org/articles/PMC10564893",
    "europePmcUrl": "https://europepmc.org/article/MED/37648796",
    "extractedAt": "2026-02-19T14:30:00.000Z"
}

Output fields reference

Field	Type	Description
`pmid`	String	PubMed identifier
`pmcid`	String	PubMed Central identifier
`doi`	String	Digital Object Identifier
`title`	String	Publication title
`authorString`	String	Comma-separated author names
`authors`	Array	Structured author objects with fullName, firstName, lastName, affiliation
`journalTitle`	String	Name of the journal
`journalVolume`	String	Journal volume number
`journalIssue`	String	Journal issue number
`pageInfo`	String	Page range (e.g., "295-302")
`pubYear`	String	Publication year
`firstPublicationDate`	String	Date of first publication (YYYY-MM-DD)
`abstractText`	String	Full abstract text
`citedByCount`	Number	Number of citations in Europe PMC
`isOpenAccess`	Boolean	Whether the article is open access
`inPMC`	Boolean	Whether the article is in PubMed Central
`inEPMC`	Boolean	Whether the article is in Europe PMC
`source`	String	Source database (MED, PMC, PPR, etc.)
`pubType`	Array	Publication types (e.g., "research-article", "Review")
`meshTerms`	Array	Medical Subject Heading terms
`fullTextUrl`	String	Best available full-text URL (HTML preferred over PDF)
`europePmcUrl`	String	Direct link to the article on Europe PMC
`extractedAt`	String	ISO 8601 timestamp of when the data was extracted

Use cases

Systematic literature reviews -- collect all publications matching specific criteria for structured evidence synthesis in medical or scientific research
Research trend analysis -- track publication volume, citation patterns, and emerging topics across biomedical fields over time
Competitor intelligence for pharma -- monitor publications from competing research groups or pharmaceutical companies working on similar drug targets
Grant application preparation -- quickly survey existing literature on a topic to establish research gaps and justify funding proposals
Clinical evidence gathering -- find clinical trial publications and reviews relevant to specific treatments, diseases, or medical devices
Preprint monitoring -- filter by source PPR to track bioRxiv and medRxiv preprints before they appear in peer-reviewed journals
Author publication tracking -- follow a specific researcher's output by combining author name filters with date ranges
Knowledge graph construction -- extract structured metadata including MeSH terms, authors, and citations to build biomedical knowledge graphs and network analyses
Open access content mining -- filter for open access articles to build text mining datasets for NLP, machine learning, or AI training
Journal benchmarking -- compare publication volume and citation impact across journals in a specific research area

API & integrations

Python

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")

run = client.actor("RMqjhlGfzi7ScjOGH").call(run_input={
    "query": "CRISPR gene editing",
    "openAccessOnly": True,
    "sortBy": "CITED desc",
    "maxResults": 100,
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(f"{item['title']} -- {item['citedByCount']} citations")

JavaScript

import { ApifyClient } from "apify-client";

const client = new ApifyClient({ token: "YOUR_API_TOKEN" });

const run = await client.actor("RMqjhlGfzi7ScjOGH").call({
    query: "CRISPR gene editing",
    openAccessOnly: true,
    sortBy: "CITED desc",
    maxResults: 100,
});

const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.log(`${item.title} -- ${item.citedByCount} citations`);
});

cURL

curl "https://api.apify.com/v2/acts/RMqjhlGfzi7ScjOGH/runs" \
  -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -d '{
    "query": "CRISPR gene editing",
    "openAccessOnly": true,
    "sortBy": "CITED desc",
    "maxResults": 100
  }'

Platform integrations

Apify Schedules -- run daily or weekly to monitor new publications on a topic
Webhooks -- trigger downstream processing when a run completes (e.g., email alerts, Slack notifications)
Zapier / Make -- connect Apify to thousands of apps for automated literature monitoring workflows
Google Sheets -- export results directly to a spreadsheet for collaborative review
Amazon S3 / Google Cloud Storage -- push datasets to cloud storage for archival or further processing

How it works

Parse input -- reads the search query and optional filters (author, journal, dates, open access, source database)
Build Lucene query -- constructs a Europe PMC query string combining free text with field-specific operators like AUTH:"name", JOURNAL:"title", FIRST_PDATE:[from TO to], OPEN_ACCESS:y, and SRC:source
Query the API -- sends the request to the Europe PMC REST API at https://www.ebi.ac.uk/europepmc/webservices/rest/search with resultType=core for full metadata
Paginate with cursors -- uses cursor-based pagination (cursorMark) with page sizes up to 1,000 to efficiently collect large result sets
Transform results -- normalizes nested API responses into flat, consistent output records with 23 structured fields
Extract full-text URLs -- finds the best available full-text link for each article, preferring HTML over PDF over any other format
Push to dataset -- stores each batch of transformed records in the Apify dataset as they are collected

Input Query + Filters
        |
        v
  [Build Lucene Query]
        |
        v
  [Europe PMC REST API] <--cursorMark-- [Next Page?]
        |                                    ^
        v                                    |
  [Transform Results] --> [Push to Dataset] -+
        |
        v
  Clean JSON Output (up to 500 records)

Performance & cost

Scenario	Results	Approx. Duration	Apify Platform Cost
Quick search	50	5--10 seconds	< $0.01
Medium batch	200	15--30 seconds	< $0.01
Maximum batch	500	30--60 seconds	~$0.01
Scheduled daily run	50/day	5--10 seconds/run	< $0.30/month

Memory requirement: 256 MB (minimum Apify tier)
The Europe PMC API is completely free with no usage limits for reasonable query volumes
Cost is driven entirely by Apify compute time, which is minimal for this API-only actor
No browser rendering or proxy infrastructure required

Limitations

Maximum 500 results per run -- the actor caps output at 500 records to keep runs fast and manageable. For larger datasets, run multiple queries with narrower filters.
Abstract only for paywalled articles -- the actor provides full metadata and abstracts for all articles, but full-text content behind paywalls requires separate institutional access.
Citation counts may lag -- the citedByCount field reflects Europe PMC's citation index, which may not be as current as Google Scholar or other citation databases.
Preprint metadata may be sparse -- preprints from bioRxiv and medRxiv may lack MeSH terms, full author affiliations, or other metadata that is added during peer review and indexing.
API rate limits -- while the Europe PMC API has no formal authentication requirement, extremely high-frequency requests may be throttled. The actor uses reasonable page sizes and sequential requests to avoid this.
Date filtering uses first publication date -- the FIRST_PDATE field may differ from the journal publication date for articles that appeared as early releases or preprints first.
No full-text download -- the actor extracts metadata and links but does not download or parse the full text of articles.

Responsible use

Respect publisher terms -- while Europe PMC metadata is freely available, full-text articles may be subject to publisher copyright. Always check the license before redistributing or text mining full-text content.
Cite your sources -- if you use data from this actor in research publications or reports, cite the original articles and acknowledge Europe PMC as the data source.
Use reasonable query volumes -- avoid scheduling unnecessarily frequent runs or requesting maximum results when fewer would suffice. The Europe PMC API is a shared public resource.
Comply with institutional policies -- if you are accessing this actor through an institutional Apify account, ensure your usage complies with your organization's data handling and research ethics policies.
Do not use for spam or harassment -- do not use extracted author contact information (affiliations) for unsolicited bulk communications.

FAQ

Q: What databases does Europe PMC cover? A: Europe PMC indexes over 40 million records from three primary sources: PubMed (MED) for MEDLINE biomedical citations, PubMed Central (PMC) for full-text open access articles, and preprint servers (PPR) including bioRxiv and medRxiv. It also includes content from patents, agricultural research, and European life science repositories.

Q: Do I need an API key to use this actor? A: No. The Europe PMC REST API is completely free and open. This actor requires no API keys, tokens, or registration to run.

Q: How is this different from the PubMed Research Search actor? A: Europe PMC includes everything in PubMed plus additional content from PubMed Central full-text articles, preprints from bioRxiv/medRxiv, and European life science sources. It also provides MeSH terms, richer author metadata with affiliations, and direct full-text URLs in a single query.

Q: Can I get the full text of articles? A: The actor provides a fullTextUrl field with the best available link to the full text (HTML or PDF) for open access articles. For paywalled articles, you receive the abstract and all metadata but need institutional access for full text.

Q: What query syntax is supported? A: The actor supports Europe PMC's Lucene-based query syntax. You can use free text, field-specific operators (TITLE:"term", AUTH:"name", DOI:10.xxx, ABSTRACT:"keyword"), Boolean operators (AND, OR, NOT), and wildcards (*). The dedicated filter fields for author, journal, date, and source are combined automatically.

Q: Can I search for preprints specifically? A: Yes. Set the Source Database parameter to PPR (Preprints) to restrict results to bioRxiv, medRxiv, and other preprint servers indexed by Europe PMC.

Q: How does pagination work? A: The actor uses cursor-based pagination with the Europe PMC API's cursorMark parameter. Each API call retrieves up to 1,000 results, and the actor continues fetching pages until it reaches your maxResults limit or exhausts available results.

Q: Can I schedule automatic searches for new publications? A: Yes. Set up an Apify schedule to run the actor daily or weekly. Use the dateFrom parameter set to a recent date to capture only newly published articles. Combine with webhooks to send email or Slack alerts when new papers match your criteria.

Q: What are MeSH terms and why are they useful? A: MeSH (Medical Subject Headings) is a standardized vocabulary maintained by the National Library of Medicine. MeSH terms enable consistent topic classification across articles, making them valuable for systematic reviews, meta-analyses, and building structured topic taxonomies.

Q: How current is the data? A: Europe PMC updates its index daily. PubMed records typically appear within 1--2 days of being indexed by the National Library of Medicine. Preprints are indexed shortly after they are posted to bioRxiv or medRxiv.

Q: Can I sort results by citation count? A: Yes. Set the Sort By parameter to CITED desc to return the most highly cited articles first. This is useful for identifying seminal papers and high-impact research on any topic.

Q: What happens if my query returns more than 500 results? A: The actor returns up to 500 results (or your configured maxResults limit, whichever is lower). If the total hit count exceeds this, you can narrow your search with additional filters or run multiple queries with non-overlapping date ranges to cover the full result set.

Actor	Description
PubMed Biomedical Literature Search	Search PubMed for MEDLINE-indexed biomedical citations with abstracts and metadata
Semantic Scholar Paper Search	Search Semantic Scholar for academic papers with AI-generated TLDRs and citation data
OpenAlex Research Paper Search	Search OpenAlex for open scholarly metadata across all academic disciplines
Crossref Academic Paper Search	Search Crossref for DOI-registered publications with reference metadata
ORCID Researcher Search	Look up researchers by ORCID ID to find their publication history and affiliations
ArXiv Preprint Paper Search	Search ArXiv for preprints in physics, mathematics, computer science, and related fields

Pmc Profile Scraper

getdataforme/pmc-profile-scraper

This Apify actor efficiently scrapes PubMed Central articles to extract titles, full content, authors, and keywords, providing structured JSON data for researchers and analysts....

GetDataForMe

PubMed Search — 37M+ Articles by Author & MeSH

ryanclinton/pubmed-research-search

Search and extract structured metadata from PubMed, the world's largest biomedical literature database with over 37 million citations. Query by keyword, author, journal, date range, and article type using the NCBI E-utilities API.

Ryan Clinton

Unified Preprint Search

logical_vivacity/unified-preprint-search

One Apify Actor, five sources: PubMed, arXiv, bioRxiv, medRxiv, chemRxiv.

Logical Vivacity

PubMed MCP Server

agentify/pubmed-mcp-server

A server implementing the Model Context Protocol (MCP) for accessing and processing PubMed biomedical literature data

agentify

PubMed Article Search

automation-lab/pubmed-scraper

Search PubMed's 35M+ biomedical articles and extract structured data: titles, authors, abstracts, DOIs, MeSH keywords, and publication types. Free NCBI API, no key required.

Stas Persiianenko

PubMed Search Scraper

easyapi/pubmed-search-scraper

Scrape research papers and academic articles from PubMed based on search terms. Extract comprehensive article metadata including titles, authors, citations, abstracts, and more. Perfect for medical research and literature reviews.

EasyApi

PubMed Research Intelligence

funnyvalentine69/pubmed-research-intelligence

AI-powered natural language queries against PubMed. Search biomedical literature by topic, author, journal, or MeSH term. Returns structured article data with AI-synthesized literature analysis. Runs as actor or MCP server for AI agents.

Samson Southafeng

PubMed & NCBI Databases API

alizarin_refrigerator-owner/pubmed-ncbi-databases-api

Access PubMed and NCBI databases for biomedical literature. Search 36+ million citations, get article abstracts, citation metrics, author profiles, and journal data. Essential for scientific research and pharma market intelligence.

The Howlers

🔬 PubMed Research Search — Medical Papers & Data

nexgendata/pubmed-research-search

Search PubMed's 35M+ biomedical research papers. Extract abstracts, authors, citations, MeSH terms, and publication data. Ideal for literature reviews, meta-analyses, and medical research.