# Europe PMC Scraper (`crawlerbros/europe-pmc-scraper`) Actor

Scrape Europe PMC, 42M+ biomedical literature records including PubMed, PubMed Central, patents, and preprints. Search publications, get article details by PMID or DOI, and retrieve citation/reference lists.

- **URL**: https://apify.com/crawlerbros/europe-pmc-scraper.md
- **Developed by:** [Crawler Bros](https://apify.com/crawlerbros) (community)
- **Categories:** Developer tools, Automation, Other
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 7 bookmarks
- **User rating**: 5.00 out of 5 stars

## Pricing

from $3.00 / 1,000 results

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Europe PMC Scraper

Extract biomedical literature from **Europe PMC** — one of the world's most comprehensive repositories of life science publications, covering **42 million+ records** including PubMed/MEDLINE, PubMed Central full-text articles, patents, preprints, theses, and more.

### What Is Europe PMC?

Europe PMC is a free, open access repository of biomedical and life sciences literature maintained by the European Bioinformatics Institute (EMBL-EBI). It aggregates content from multiple sources including PubMed, PubMed Central (PMC), clinical trial records, patents, preprints, and theses — making it the most comprehensive biomedical literature database freely available.

### What This Actor Does

This actor queries the [Europe PMC REST API](https://europepmc.org/RestfulWebService) to:

- **Search publications** across all sources by keyword, year range, source database, and open access status
- **Retrieve a specific article** by PubMed ID (PMID) or DOI
- **Get citation lists** — all articles citing a given PMID
- **Get reference lists** — all references of a given PMID

No authentication or API key is required.

### Modes

| Mode | Description | Key Parameters |
|------|-------------|----------------|
| `search` | Full-text search across 42M+ publications | `query`, `source`, `isOpenAccess`, `sortBy`, `fromYear`, `toYear` |
| `byPMID` | Get a specific article by PubMed ID | `pmid` |
| `byDOI` | Get a specific article by DOI | `doi` |
| `citations` | Get articles that cite a PMID | `pmid` |
| `references` | Get references of a PMID | `pmid` |

### Input Parameters

| Parameter | Type | Description |
|-----------|------|-------------|
| `mode` | Select | Operating mode (required) |
| `query` | String | Search query. Supports field operators like `TITLE:malaria AND OPEN_ACCESS:y` |
| `pmid` | String | PubMed ID (for byPMID, citations, references modes) |
| `doi` | String | Article DOI (for byDOI mode) |
| `source` | Select | Filter by source: MED, PMC, PAT, ETH, HIR, CTX, AGR, CBA, PPR |
| `isOpenAccess` | Boolean | Return only open access articles |
| `sortBy` | Select | Sort: `relevance` (default), `cited` (most cited), `date` (most recent) |
| `fromYear` | Integer | Filter from this publication year |
| `toYear` | Integer | Filter up to this publication year |
| `maxItems` | Integer | Maximum records (1–1000, default 50) |

### Source Database Codes

| Code | Description |
|------|-------------|
| `MED` | PubMed/MEDLINE |
| `PMC` | Europe PMC full-text articles |
| `PAT` | Patents |
| `ETH` | EthOS British Library theses |
| `HIR` | Health Improvement Research |
| `CTX` | ClinicalTrials.gov |
| `AGR` | Agricola |
| `CBA` | CBA |
| `PPR` | Preprints |

### Output Fields

| Field | Type | Description |
|-------|------|-------------|
| `pmid` | String | PubMed ID |
| `pmcid` | String | PubMed Central ID (e.g. `PMC4371661`) |
| `doi` | String | Digital Object Identifier |
| `title` | String | Publication title |
| `authors` | Array | Author names |
| `journalName` | String | Journal full name |
| `journalIssn` | String | Journal ISSN |
| `pubYear` | Integer | Publication year |
| `abstract` | String | Full abstract text |
| `language` | String | Publication language (e.g. `eng`) |
| `isOpenAccess` | Boolean | Whether freely available |
| `hasPDF` | Boolean | Whether PDF is available |
| `citedByCount` | Integer | Number of citing articles |
| `pubType` | Array | Publication types (e.g. `research-article`) |
| `fullTextUrls` | Array | Full-text access URLs |
| `europePmcUrl` | String | Europe PMC article page URL |
| `scrapedAt` | String | ISO 8601 scrape timestamp |

### Example Input

```json
{
  "mode": "search",
  "query": "malaria",
  "maxItems": 10
}
````

```json
{
  "mode": "search",
  "query": "CRISPR gene editing",
  "source": "MED",
  "isOpenAccess": true,
  "sortBy": "cited",
  "fromYear": 2018,
  "maxItems": 50
}
```

```json
{
  "mode": "byPMID",
  "pmid": "25781006"
}
```

```json
{
  "mode": "citations",
  "pmid": "25781006",
  "maxItems": 100
}
```

### Advanced Search Queries

Europe PMC supports field-specific searches:

- `TITLE:malaria` — search in title only
- `ABSTRACT:vaccine` — search in abstract only
- `AUTH:Smith` — filter by author name
- `JOURNAL:"Nature"` — filter by journal name
- `OPEN_ACCESS:y` — open access only (same as `isOpenAccess: true`)
- `HAS_PDF:y` — filter to records with PDF available
- `SRC:PPR` — preprints only

Combine with `AND`, `OR`, `NOT`:

```
TITLE:malaria AND OPEN_ACCESS:y AND PUB_YEAR:[2020 TO 2024]
```

### FAQs

**Is an API key required?**
No. The Europe PMC API is fully public and requires no authentication.

**What is the rate limit?**
The actor applies a polite 1-second delay between requests to avoid overloading the server.

**Can I get the full text of articles?**
The `fullTextUrls` field contains links to full-text versions where available. Open access articles typically provide free HTML and PDF access.

**How do I get citations for an article?**
Use `mode=citations` with the article's PMID. The actor returns all articles that cite the specified paper.

**What is the difference between PMC and MED sources?**
`MED` (MEDLINE/PubMed) contains abstracts and metadata. `PMC` (PubMed Central) contains full-text articles deposited in Europe PMC.

**Are preprints included?**
Yes — use `source=PPR` to filter specifically for preprints (bioRxiv, medRxiv, etc.) or include them in general searches.

**How far back does the data go?**
Coverage varies by source. MEDLINE includes articles from the 1940s onward; some journals have data from the early 1800s.

**How many records can I retrieve?**
Set `maxItems` up to 1000 per run. For highly cited search terms, the API returns millions of records — use `fromYear`/`toYear` and `source` filters to narrow results.

# Actor input Schema

## `mode` (type: `string`):

What to fetch from Europe PMC.

## `query` (type: `string`):

Full-text search query (mode=search). Supports field queries like 'TITLE:malaria AND OPEN\_ACCESS:y'.

## `pmid` (type: `string`):

PubMed ID of the article (modes: byPMID, citations, references). E.g. 25781006

## `doi` (type: `string`):

Article DOI (mode=byDOI). E.g. 10.1186/s12936-015-0579-4

## `source` (type: `string`):

Filter results by source database (mode=search).

## `isOpenAccess` (type: `boolean`):

When enabled, only return open access articles.

## `sortBy` (type: `string`):

Sort order for search results.

## `fromYear` (type: `integer`):

Filter publications from this year onward (mode=search).

## `toYear` (type: `integer`):

Filter publications up to this year (mode=search).

## `maxItems` (type: `integer`):

Maximum number of records to return.

## Actor input object example

```json
{
  "mode": "search",
  "query": "malaria",
  "source": "",
  "isOpenAccess": false,
  "sortBy": "relevance",
  "maxItems": 50
}
```

# Actor output Schema

## `results` (type: `string`):

Dataset containing all scraped Europe PMC publication records.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "mode": "search",
    "query": "malaria",
    "source": "",
    "isOpenAccess": false,
    "sortBy": "relevance",
    "maxItems": 50
};

// Run the Actor and wait for it to finish
const run = await client.actor("crawlerbros/europe-pmc-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "mode": "search",
    "query": "malaria",
    "source": "",
    "isOpenAccess": False,
    "sortBy": "relevance",
    "maxItems": 50,
}

# Run the Actor and wait for it to finish
run = client.actor("crawlerbros/europe-pmc-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "mode": "search",
  "query": "malaria",
  "source": "",
  "isOpenAccess": false,
  "sortBy": "relevance",
  "maxItems": 50
}' |
apify call crawlerbros/europe-pmc-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=crawlerbros/europe-pmc-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Europe PMC Scraper",
        "description": "Scrape Europe PMC, 42M+ biomedical literature records including PubMed, PubMed Central, patents, and preprints. Search publications, get article details by PMID or DOI, and retrieve citation/reference lists.",
        "version": "1.0",
        "x-build-id": "TJUKD0ZWLse4gJhMj"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/crawlerbros~europe-pmc-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-crawlerbros-europe-pmc-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/crawlerbros~europe-pmc-scraper/runs": {
            "post": {
                "operationId": "runs-sync-crawlerbros-europe-pmc-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/crawlerbros~europe-pmc-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-crawlerbros-europe-pmc-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "mode"
                ],
                "properties": {
                    "mode": {
                        "title": "Mode",
                        "enum": [
                            "search",
                            "byPMID",
                            "byDOI",
                            "citations",
                            "references"
                        ],
                        "type": "string",
                        "description": "What to fetch from Europe PMC.",
                        "default": "search"
                    },
                    "query": {
                        "title": "Search query",
                        "type": "string",
                        "description": "Full-text search query (mode=search). Supports field queries like 'TITLE:malaria AND OPEN_ACCESS:y'.",
                        "default": "malaria"
                    },
                    "pmid": {
                        "title": "PubMed ID (PMID)",
                        "type": "string",
                        "description": "PubMed ID of the article (modes: byPMID, citations, references). E.g. 25781006"
                    },
                    "doi": {
                        "title": "DOI",
                        "type": "string",
                        "description": "Article DOI (mode=byDOI). E.g. 10.1186/s12936-015-0579-4"
                    },
                    "source": {
                        "title": "Source database",
                        "enum": [
                            "",
                            "MED",
                            "PMC",
                            "PAT",
                            "ETH",
                            "HIR",
                            "CTX",
                            "AGR",
                            "CBA",
                            "PPR"
                        ],
                        "type": "string",
                        "description": "Filter results by source database (mode=search).",
                        "default": ""
                    },
                    "isOpenAccess": {
                        "title": "Open access only",
                        "type": "boolean",
                        "description": "When enabled, only return open access articles.",
                        "default": false
                    },
                    "sortBy": {
                        "title": "Sort by",
                        "enum": [
                            "relevance",
                            "cited",
                            "date"
                        ],
                        "type": "string",
                        "description": "Sort order for search results.",
                        "default": "relevance"
                    },
                    "fromYear": {
                        "title": "From year",
                        "minimum": 1800,
                        "maximum": 2100,
                        "type": "integer",
                        "description": "Filter publications from this year onward (mode=search)."
                    },
                    "toYear": {
                        "title": "To year",
                        "minimum": 1800,
                        "maximum": 2100,
                        "type": "integer",
                        "description": "Filter publications up to this year (mode=search)."
                    },
                    "maxItems": {
                        "title": "Max items",
                        "minimum": 1,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Maximum number of records to return.",
                        "default": 50
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
