# CORE Open Research Scraper (`crawlerbros/core-open-research-scraper`) Actor

Search millions of open-access research papers from CORE - the world's largest aggregator of open access research. Search by topic, author, or institution, or browse recent papers. Returns title, abstract, authors, DOI, download URL, and more. No API key required.

- **URL**: https://apify.com/crawlerbros/core-open-research-scraper.md
- **Developed by:** [Crawler Bros](https://apify.com/crawlerbros) (community)
- **Categories:** AI, Automation
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $3.00 / 1,000 results

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## CORE Open Research Scraper

Search millions of **open-access research papers** from [CORE](https://core.ac.uk/) — the world's largest aggregator of open access scholarly research, covering 220M+ research outputs from thousands of repositories worldwide. Search by topic, author name, or institution, or browse recently published papers. No API key required for basic use.

### What this actor does

- **Four modes:** `searchPapers`, `searchByAuthor`, `searchByInstitution`, `getRecentPapers`
- **Full-text search:** across 220M+ open access research papers
- **Author search:** find all papers by a specific researcher
- **Institution search:** browse research output from universities and research centers
- **Recent papers:** browse the latest open access publications
- **Year filters:** optional `yearFrom` / `yearTo` to narrow results
- **Empty fields are omitted** — no nulls in output

### Modes

| Mode | Description |
|---|---|
| `searchPapers` | Full-text keyword search across all papers (default) |
| `searchByAuthor` | Search papers by author name |
| `searchByInstitution` | Search papers from a specific university or research center |
| `getRecentPapers` | Browse recently published open access papers |

### Input

| Field | Type | Description |
|---|---|---|
| `mode` | select | Which mode to use (default: `searchPapers`) |
| `searchQuery` | string | Keyword or topic to search (default prefill: machine learning) |
| `authorName` | string | Author name for `searchByAuthor` mode (e.g. Alan Turing) |
| `institution` | string | Institution name for `searchByInstitution` mode (e.g. MIT, Oxford) |
| `yearFrom` | integer | Filter papers published from this year (optional) |
| `yearTo` | integer | Filter papers published up to this year (optional) |
| `apiKey` | string | Optional CORE API key to increase rate limits |
| `maxItems` | integer | Maximum papers to return, 1–200 (default: 50) |

### Output per paper

| Field | Type | Description |
|---|---|---|
| `coreId` | integer | CORE unique paper identifier |
| `title` | string | Paper title |
| `abstract` | string | Paper abstract (up to 5000 chars) |
| `authors` | array | List of author names |
| `year` | integer | Publication year |
| `downloadUrl` | string | Direct URL to open access PDF |
| `oaiPmhId` | string | OAI-PMH repository identifier |
| `doi` | string | Digital Object Identifier |
| `publisher` | string | Publisher name |
| `journals` | array | Journal titles where published |
| `repositoryDocument` | object | Source repository name and URL |
| `scrapedAt` | string | ISO 8601 timestamp of when the record was scraped |

### Data source

[CORE](https://core.ac.uk/) is operated by The Open University (UK) and is the world's largest aggregator of open access research, covering 220M+ research outputs from 10,000+ repositories. The CORE API v3 is freely accessible without registration for basic queries (limited to 10 items/page). Register for a free API key at [core.ac.uk/services/api](https://core.ac.uk/services/api) to increase limits.

### Example output

```json
{
  "coreId": 8848131,
  "title": "Deep Learning for Natural Language Processing",
  "abstract": "This paper surveys deep learning methods for NLP tasks including text classification, named entity recognition, and machine translation...",
  "authors": ["Smith, John", "Doe, Jane"],
  "year": 2021,
  "downloadUrl": "https://core.ac.uk/download/8848131.pdf",
  "oaiPmhId": "oai:example.org:8848131",
  "doi": "10.1234/nlp.2021",
  "publisher": "Springer",
  "journals": ["Journal of AI Research"],
  "repositoryDocument": {
    "repositoryName": "arXiv",
    "repositoryUrl": "https://api.core.ac.uk/v3/data-providers/1"
  },
  "scrapedAt": "2026-06-03T10:00:00+00:00"
}
````

### FAQs

**Do I need an API key?**
No. CORE works without an API key for basic queries (up to 10 results per page with rate limiting). Register free at [core.ac.uk/services/api](https://core.ac.uk/services/api) to get a key that increases limits significantly.

**How many papers are available?**
CORE indexes 220M+ open access research outputs from over 10,000 repositories worldwide, including arXiv, PubMed Central, institutional repositories, and more.

**Can I download the PDFs?**
Each record includes a `downloadUrl` linking to the open access PDF on CORE's servers. PDF availability depends on the source repository.

**What languages are papers in?**
CORE aggregates research in all languages. Use specific keywords or add language terms to your query to filter.

**What is `oaiPmhId`?**
The OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) identifier uniquely identifies the paper in its source repository. This is what CORE uses to harvest metadata.

**How does `repositoryDocument` work?**
This contains the name and URL of the source data provider (e.g., arXiv, PubMed Central, a university repository) that contributed the paper to CORE.

**Can I search by year?**
Yes. Use `yearFrom` and/or `yearTo` to filter by publication year in any mode.

# Actor input Schema

## `mode` (type: `string`):

How to search for papers. searchPapers does full-text search; searchByAuthor searches by author name; searchByInstitution searches papers from a specific university; getRecentPapers fetches recently published open access papers.

## `searchQuery` (type: `string`):

Keyword or topic to search for (used in searchPapers and getRecentPapers modes). Example: machine learning, climate change, CRISPR.

## `authorName` (type: `string`):

Full or partial author name to search for (used in searchByAuthor mode). Example: Alan Turing, Hinton.

## `institution` (type: `string`):

University or research institution name (used in searchByInstitution mode). Example: MIT, Stanford, Oxford.

## `yearFrom` (type: `integer`):

Filter papers published from this year onwards (optional). Example: 2020.

## `yearTo` (type: `integer`):

Filter papers published up to this year (optional). Example: 2024.

## `apiKey` (type: `string`):

Optional CORE API key. Without a key the actor works but is limited to 10 results per page with rate limiting. Register free at https://core.ac.uk/services/api to increase limits.

## `maxItems` (type: `integer`):

Maximum number of papers to return (1–200). Default is 50.

## `proxyConfiguration` (type: `object`):

Apify proxy configuration for rotating IPs. Recommended to avoid CORE API rate-limits on shared datacenter IPs.

## Actor input object example

```json
{
  "mode": "searchPapers",
  "searchQuery": "machine learning",
  "maxItems": 20,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}
```

# Actor output Schema

## `papers` (type: `string`):

Dataset containing all scraped open access research paper records.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "mode": "searchPapers",
    "searchQuery": "machine learning",
    "maxItems": 20,
    "proxyConfiguration": {
        "useApifyProxy": true
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("crawlerbros/core-open-research-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "mode": "searchPapers",
    "searchQuery": "machine learning",
    "maxItems": 20,
    "proxyConfiguration": { "useApifyProxy": True },
}

# Run the Actor and wait for it to finish
run = client.actor("crawlerbros/core-open-research-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "mode": "searchPapers",
  "searchQuery": "machine learning",
  "maxItems": 20,
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}' |
apify call crawlerbros/core-open-research-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=crawlerbros/core-open-research-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "CORE Open Research Scraper",
        "description": "Search millions of open-access research papers from CORE - the world's largest aggregator of open access research. Search by topic, author, or institution, or browse recent papers. Returns title, abstract, authors, DOI, download URL, and more. No API key required.",
        "version": "1.0",
        "x-build-id": "UfgYhxWj3PJAeYtVG"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/crawlerbros~core-open-research-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-crawlerbros-core-open-research-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/crawlerbros~core-open-research-scraper/runs": {
            "post": {
                "operationId": "runs-sync-crawlerbros-core-open-research-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/crawlerbros~core-open-research-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-crawlerbros-core-open-research-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "mode"
                ],
                "properties": {
                    "mode": {
                        "title": "Mode",
                        "enum": [
                            "searchPapers",
                            "searchByAuthor",
                            "searchByInstitution",
                            "getRecentPapers"
                        ],
                        "type": "string",
                        "description": "How to search for papers. searchPapers does full-text search; searchByAuthor searches by author name; searchByInstitution searches papers from a specific university; getRecentPapers fetches recently published open access papers.",
                        "default": "searchPapers"
                    },
                    "searchQuery": {
                        "title": "Search query",
                        "type": "string",
                        "description": "Keyword or topic to search for (used in searchPapers and getRecentPapers modes). Example: machine learning, climate change, CRISPR."
                    },
                    "authorName": {
                        "title": "Author name",
                        "type": "string",
                        "description": "Full or partial author name to search for (used in searchByAuthor mode). Example: Alan Turing, Hinton."
                    },
                    "institution": {
                        "title": "Institution / university",
                        "type": "string",
                        "description": "University or research institution name (used in searchByInstitution mode). Example: MIT, Stanford, Oxford."
                    },
                    "yearFrom": {
                        "title": "Published from year",
                        "minimum": 1900,
                        "maximum": 2030,
                        "type": "integer",
                        "description": "Filter papers published from this year onwards (optional). Example: 2020."
                    },
                    "yearTo": {
                        "title": "Published to year",
                        "minimum": 1900,
                        "maximum": 2030,
                        "type": "integer",
                        "description": "Filter papers published up to this year (optional). Example: 2024."
                    },
                    "apiKey": {
                        "title": "CORE API key (optional)",
                        "type": "string",
                        "description": "Optional CORE API key. Without a key the actor works but is limited to 10 results per page with rate limiting. Register free at https://core.ac.uk/services/api to increase limits."
                    },
                    "maxItems": {
                        "title": "Max items",
                        "minimum": 1,
                        "maximum": 200,
                        "type": "integer",
                        "description": "Maximum number of papers to return (1–200). Default is 50.",
                        "default": 50
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Apify proxy configuration for rotating IPs. Recommended to avoid CORE API rate-limits on shared datacenter IPs."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
