Europe PMC Biomedical Literature Search
Pricing
from $2.00 / 1,000 paper fetcheds
Europe PMC Biomedical Literature Search
Search 40M+ biomedical & life science papers via Europe PMC API. Filter by author, journal, date range, open access status & source database. Returns abstracts, citation counts, MeSH terms, DOIs, author affiliations & full-text links.
Pricing
from $2.00 / 1,000 paper fetcheds
Rating
0.0
(0)
Developer

ryan clinton
Actor stats
0
Bookmarked
2
Total users
0
Monthly active users
4 hours ago
Last modified
Categories
Share
Europe PMC Literature Search
Search and extract biomedical and life science publications from Europe PMC -- a free, comprehensive repository of over 40 million articles aggregated from PubMed, PubMed Central (PMC), and preprint servers including bioRxiv and medRxiv. Filter by author, journal, date range, open access status, and source database. Returns structured citation data with abstracts, MeSH terms, full-text URLs, and citation counts. No API key required.
Why use Europe PMC Literature Search?
- Access 40M+ publications in one search -- Europe PMC unifies PubMed (MED), PMC full-text (PMC), and preprints (PPR) into a single searchable index, eliminating the need to query multiple databases separately.
- No API key or authentication needed -- the Europe PMC REST API is completely free and open, so you can start extracting data immediately without registration or credentials.
- Rich structured metadata -- every result includes PMID, PMCID, DOI, full author lists with affiliations, abstract text, MeSH subject headings, citation counts, publication types, and direct full-text URLs.
- Automated pagination and data transformation -- the actor handles cursor-based pagination, nested API response parsing, and output normalization so you get clean, flat JSON records ready for analysis.
- Schedule recurring literature monitoring -- run the actor daily or weekly with date range filters to automatically track new publications on any biomedical topic.
- Export anywhere -- results are stored in standard Apify datasets that export to JSON, CSV, Excel, Google Sheets, or feed directly into downstream workflows via webhooks and the Apify API.
Key features
- Lucene-style query syntax -- supports free-text search plus field-specific operators like
TITLE:"term",AUTH:"name",DOI:10.xxx, and Boolean combinations with AND/OR/NOT - Author filtering -- narrow results to a specific researcher by name using the dedicated author filter field
- Journal filtering -- restrict searches to publications from a specific journal title
- Date range filtering -- specify start and end dates in YYYY-MM-DD format to target a publication window
- Open access filtering -- toggle a single checkbox to return only freely available open access publications
- Source database selection -- choose between All sources, PubMed (MED) for MEDLINE citations, PMC for full-text articles, or Preprints (PPR) for bioRxiv/medRxiv content
- Flexible sort options -- sort results by relevance, citation count (most cited first), or publication date (most recent first)
- Full-text URL extraction -- automatically finds the best available full-text link for each article, preferring HTML over PDF over any other format
- MeSH term extraction -- returns Medical Subject Headings for each article, enabling standardized topic classification and filtering
- Up to 500 results per run -- cursor-based pagination collects large result sets efficiently with page sizes up to 1,000 per API call
How to use Europe PMC Literature Search
Using the Apify Console
- Go to the Europe PMC Literature Search actor page on Apify.
- Click Start to open the input configuration form.
- Enter your search query in the Search Query field (e.g.,
CRISPR gene editing). - Optionally fill in Author Name, Journal Name, Date From, Date To, Open Access Only, and Source Database filters.
- Select your preferred Sort By option -- Relevance, Most Cited, or Most Recent.
- Set the Max Results value (1 to 500, default is 50).
- Click Start to run the actor.
- When the run finishes, open the Dataset tab to view, download, or export results in JSON, CSV, or Excel format.
Using the Apify API or CLI
apify call ryanclinton/europe-pmc-search \--input='{"query":"CRISPR gene editing","openAccessOnly":true,"sortBy":"CITED desc","maxResults":100}'
Input parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
query | String | Yes | -- | Search query. Supports free text and field syntax like TITLE:"term", AUTH:"name", DOI:10.xxx |
author | String | No | -- | Filter by author name (e.g., "Smith J") |
journal | String | No | -- | Filter by journal name (e.g., "Nature") |
dateFrom | String | No | -- | Start date in YYYY-MM-DD format |
dateTo | String | No | -- | End date in YYYY-MM-DD format |
openAccessOnly | Boolean | No | false | Only return open access publications |
source | String | No | All | Source database: All, PubMed (MED), PMC Full Text (PMC), or Preprints (PPR) |
sortBy | String | No | RELEVANCE | Sort order: RELEVANCE, CITED desc (most cited), or P_PDATE_D desc (most recent) |
maxResults | Integer | No | 50 | Maximum number of results to return (1--500) |
Example input
{"query": "machine learning drug discovery","author": "Zhang","journal": "Nature","dateFrom": "2023-01-01","dateTo": "2025-12-31","openAccessOnly": true,"source": "MED","sortBy": "CITED desc","maxResults": 100}
Tips for effective queries
- Combine free text with field operators for precision:
TITLE:"deep learning" AND AUTH:"Chen". - Use the dedicated author and journal filter fields instead of embedding them in the query string -- the actor builds the correct Lucene syntax for you.
- Set
dateFromto a recent date and schedule recurring runs to build an automated new-publication alert pipeline. - Filter by source
PMCwhen you need articles with guaranteed full-text availability. - Filter by source
PPRto find preprints from bioRxiv and medRxiv before they are formally published.
Output
Each item in the output dataset contains 23 fields with full publication metadata.
Example output
{"pmid": "37648796","pmcid": "PMC10564893","doi": "10.1038/s41586-023-06468-x","title": "Base editing of haematopoietic stem cells rescues sickle cell disease in mice","authorString": "Newby GA, Yen JS, Woodard KJ, Mayuranathan T, Lazzarotto CR, Li Y...","authors": [{"fullName": "Newby GA","firstName": "Gregory A","lastName": "Newby","affiliation": "Merkin Institute, Broad Institute of Harvard and MIT, Cambridge, MA"},{"fullName": "Yen JS","firstName": "Jonathan S","lastName": "Yen"}],"journalTitle": "Nature","journalVolume": "623","journalIssue": "7985","pageInfo": "295-302","pubYear": "2023","firstPublicationDate": "2023-08-30","abstractText": "Sickle cell disease (SCD) is caused by a point mutation in the beta-globin gene...","citedByCount": 147,"isOpenAccess": true,"inPMC": true,"inEPMC": true,"source": "MED","pubType": ["research-article", "Journal Article"],"meshTerms": ["CRISPR-Cas Systems", "Sickle Cell Disease", "Hematopoietic Stem Cells"],"fullTextUrl": "https://europepmc.org/articles/PMC10564893","europePmcUrl": "https://europepmc.org/article/MED/37648796","extractedAt": "2026-02-19T14:30:00.000Z"}
Output fields reference
| Field | Type | Description |
|---|---|---|
pmid | String | PubMed identifier |
pmcid | String | PubMed Central identifier |
doi | String | Digital Object Identifier |
title | String | Publication title |
authorString | String | Comma-separated author names |
authors | Array | Structured author objects with fullName, firstName, lastName, affiliation |
journalTitle | String | Name of the journal |
journalVolume | String | Journal volume number |
journalIssue | String | Journal issue number |
pageInfo | String | Page range (e.g., "295-302") |
pubYear | String | Publication year |
firstPublicationDate | String | Date of first publication (YYYY-MM-DD) |
abstractText | String | Full abstract text |
citedByCount | Number | Number of citations in Europe PMC |
isOpenAccess | Boolean | Whether the article is open access |
inPMC | Boolean | Whether the article is in PubMed Central |
inEPMC | Boolean | Whether the article is in Europe PMC |
source | String | Source database (MED, PMC, PPR, etc.) |
pubType | Array | Publication types (e.g., "research-article", "Review") |
meshTerms | Array | Medical Subject Heading terms |
fullTextUrl | String | Best available full-text URL (HTML preferred over PDF) |
europePmcUrl | String | Direct link to the article on Europe PMC |
extractedAt | String | ISO 8601 timestamp of when the data was extracted |
Use cases
- Systematic literature reviews -- collect all publications matching specific criteria for structured evidence synthesis in medical or scientific research
- Research trend analysis -- track publication volume, citation patterns, and emerging topics across biomedical fields over time
- Competitor intelligence for pharma -- monitor publications from competing research groups or pharmaceutical companies working on similar drug targets
- Grant application preparation -- quickly survey existing literature on a topic to establish research gaps and justify funding proposals
- Clinical evidence gathering -- find clinical trial publications and reviews relevant to specific treatments, diseases, or medical devices
- Preprint monitoring -- filter by source PPR to track bioRxiv and medRxiv preprints before they appear in peer-reviewed journals
- Author publication tracking -- follow a specific researcher's output by combining author name filters with date ranges
- Knowledge graph construction -- extract structured metadata including MeSH terms, authors, and citations to build biomedical knowledge graphs and network analyses
- Open access content mining -- filter for open access articles to build text mining datasets for NLP, machine learning, or AI training
- Journal benchmarking -- compare publication volume and citation impact across journals in a specific research area
API & integrations
Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("RMqjhlGfzi7ScjOGH").call(run_input={"query": "CRISPR gene editing","openAccessOnly": True,"sortBy": "CITED desc","maxResults": 100,})for item in client.dataset(run["defaultDatasetId"]).iterate_items():print(f"{item['title']} -- {item['citedByCount']} citations")
JavaScript
import { ApifyClient } from "apify-client";const client = new ApifyClient({ token: "YOUR_API_TOKEN" });const run = await client.actor("RMqjhlGfzi7ScjOGH").call({query: "CRISPR gene editing",openAccessOnly: true,sortBy: "CITED desc",maxResults: 100,});const { items } = await client.dataset(run.defaultDatasetId).listItems();items.forEach((item) => {console.log(`${item.title} -- ${item.citedByCount} citations`);});
cURL
curl "https://api.apify.com/v2/acts/RMqjhlGfzi7ScjOGH/runs" \-X POST \-H "Content-Type: application/json" \-H "Authorization: Bearer YOUR_API_TOKEN" \-d '{"query": "CRISPR gene editing","openAccessOnly": true,"sortBy": "CITED desc","maxResults": 100}'
Platform integrations
- Apify Schedules -- run daily or weekly to monitor new publications on a topic
- Webhooks -- trigger downstream processing when a run completes (e.g., email alerts, Slack notifications)
- Zapier / Make -- connect Apify to thousands of apps for automated literature monitoring workflows
- Google Sheets -- export results directly to a spreadsheet for collaborative review
- Amazon S3 / Google Cloud Storage -- push datasets to cloud storage for archival or further processing
How it works
- Parse input -- reads the search query and optional filters (author, journal, dates, open access, source database)
- Build Lucene query -- constructs a Europe PMC query string combining free text with field-specific operators like
AUTH:"name",JOURNAL:"title",FIRST_PDATE:[from TO to],OPEN_ACCESS:y, andSRC:source - Query the API -- sends the request to the Europe PMC REST API at
https://www.ebi.ac.uk/europepmc/webservices/rest/searchwithresultType=corefor full metadata - Paginate with cursors -- uses cursor-based pagination (
cursorMark) with page sizes up to 1,000 to efficiently collect large result sets - Transform results -- normalizes nested API responses into flat, consistent output records with 23 structured fields
- Extract full-text URLs -- finds the best available full-text link for each article, preferring HTML over PDF over any other format
- Push to dataset -- stores each batch of transformed records in the Apify dataset as they are collected
Input Query + Filters|v[Build Lucene Query]|v[Europe PMC REST API] <--cursorMark-- [Next Page?]| ^v |[Transform Results] --> [Push to Dataset] -+|vClean JSON Output (up to 500 records)
Performance & cost
| Scenario | Results | Approx. Duration | Apify Platform Cost |
|---|---|---|---|
| Quick search | 50 | 5--10 seconds | < $0.01 |
| Medium batch | 200 | 15--30 seconds | < $0.01 |
| Maximum batch | 500 | 30--60 seconds | ~$0.01 |
| Scheduled daily run | 50/day | 5--10 seconds/run | < $0.30/month |
- Memory requirement: 256 MB (minimum Apify tier)
- The Europe PMC API is completely free with no usage limits for reasonable query volumes
- Cost is driven entirely by Apify compute time, which is minimal for this API-only actor
- No browser rendering or proxy infrastructure required
Limitations
- Maximum 500 results per run -- the actor caps output at 500 records to keep runs fast and manageable. For larger datasets, run multiple queries with narrower filters.
- Abstract only for paywalled articles -- the actor provides full metadata and abstracts for all articles, but full-text content behind paywalls requires separate institutional access.
- Citation counts may lag -- the
citedByCountfield reflects Europe PMC's citation index, which may not be as current as Google Scholar or other citation databases. - Preprint metadata may be sparse -- preprints from bioRxiv and medRxiv may lack MeSH terms, full author affiliations, or other metadata that is added during peer review and indexing.
- API rate limits -- while the Europe PMC API has no formal authentication requirement, extremely high-frequency requests may be throttled. The actor uses reasonable page sizes and sequential requests to avoid this.
- Date filtering uses first publication date -- the
FIRST_PDATEfield may differ from the journal publication date for articles that appeared as early releases or preprints first. - No full-text download -- the actor extracts metadata and links but does not download or parse the full text of articles.
Responsible use
- Respect publisher terms -- while Europe PMC metadata is freely available, full-text articles may be subject to publisher copyright. Always check the license before redistributing or text mining full-text content.
- Cite your sources -- if you use data from this actor in research publications or reports, cite the original articles and acknowledge Europe PMC as the data source.
- Use reasonable query volumes -- avoid scheduling unnecessarily frequent runs or requesting maximum results when fewer would suffice. The Europe PMC API is a shared public resource.
- Comply with institutional policies -- if you are accessing this actor through an institutional Apify account, ensure your usage complies with your organization's data handling and research ethics policies.
- Do not use for spam or harassment -- do not use extracted author contact information (affiliations) for unsolicited bulk communications.
FAQ
Q: What databases does Europe PMC cover? A: Europe PMC indexes over 40 million records from three primary sources: PubMed (MED) for MEDLINE biomedical citations, PubMed Central (PMC) for full-text open access articles, and preprint servers (PPR) including bioRxiv and medRxiv. It also includes content from patents, agricultural research, and European life science repositories.
Q: Do I need an API key to use this actor? A: No. The Europe PMC REST API is completely free and open. This actor requires no API keys, tokens, or registration to run.
Q: How is this different from the PubMed Research Search actor? A: Europe PMC includes everything in PubMed plus additional content from PubMed Central full-text articles, preprints from bioRxiv/medRxiv, and European life science sources. It also provides MeSH terms, richer author metadata with affiliations, and direct full-text URLs in a single query.
Q: Can I get the full text of articles?
A: The actor provides a fullTextUrl field with the best available link to the full text (HTML or PDF) for open access articles. For paywalled articles, you receive the abstract and all metadata but need institutional access for full text.
Q: What query syntax is supported?
A: The actor supports Europe PMC's Lucene-based query syntax. You can use free text, field-specific operators (TITLE:"term", AUTH:"name", DOI:10.xxx, ABSTRACT:"keyword"), Boolean operators (AND, OR, NOT), and wildcards (*). The dedicated filter fields for author, journal, date, and source are combined automatically.
Q: Can I search for preprints specifically?
A: Yes. Set the Source Database parameter to PPR (Preprints) to restrict results to bioRxiv, medRxiv, and other preprint servers indexed by Europe PMC.
Q: How does pagination work?
A: The actor uses cursor-based pagination with the Europe PMC API's cursorMark parameter. Each API call retrieves up to 1,000 results, and the actor continues fetching pages until it reaches your maxResults limit or exhausts available results.
Q: Can I schedule automatic searches for new publications?
A: Yes. Set up an Apify schedule to run the actor daily or weekly. Use the dateFrom parameter set to a recent date to capture only newly published articles. Combine with webhooks to send email or Slack alerts when new papers match your criteria.
Q: What are MeSH terms and why are they useful? A: MeSH (Medical Subject Headings) is a standardized vocabulary maintained by the National Library of Medicine. MeSH terms enable consistent topic classification across articles, making them valuable for systematic reviews, meta-analyses, and building structured topic taxonomies.
Q: How current is the data? A: Europe PMC updates its index daily. PubMed records typically appear within 1--2 days of being indexed by the National Library of Medicine. Preprints are indexed shortly after they are posted to bioRxiv or medRxiv.
Q: Can I sort results by citation count?
A: Yes. Set the Sort By parameter to CITED desc to return the most highly cited articles first. This is useful for identifying seminal papers and high-impact research on any topic.
Q: What happens if my query returns more than 500 results?
A: The actor returns up to 500 results (or your configured maxResults limit, whichever is lower). If the total hit count exceeds this, you can narrow your search with additional filters or run multiple queries with non-overlapping date ranges to cover the full result set.
Related actors
| Actor | Description |
|---|---|
| PubMed Biomedical Literature Search | Search PubMed for MEDLINE-indexed biomedical citations with abstracts and metadata |
| Semantic Scholar Paper Search | Search Semantic Scholar for academic papers with AI-generated TLDRs and citation data |
| OpenAlex Research Paper Search | Search OpenAlex for open scholarly metadata across all academic disciplines |
| Crossref Academic Paper Search | Search Crossref for DOI-registered publications with reference metadata |
| ORCID Researcher Search | Look up researchers by ORCID ID to find their publication history and affiliations |
| ArXiv Preprint Paper Search | Search ArXiv for preprints in physics, mathematics, computer science, and related fields |