# Academic Research MCP — Papers, DOIs & Citations (`saturday/academic-research-mcp`) Actor

MCP server + scraper for AI research agents: search 400M+ papers across Crossref, OpenAlex & arXiv, fetch DOIs, and trace both references and forward citations - plus author metrics. Built for literature review.

- **URL**: https://apify.com/saturday/academic-research-mcp.md
- **Developed by:** [Josh Compton](https://apify.com/saturday) (community)
- **Categories:** AI, Agents, Developer tools
- **Stats:** 1 total users, 0 monthly users, 0.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $4.00 / 1,000 research tool calls

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Academic Research MCP — Paper Search, DOIs & Citations (arXiv + Crossref + OpenAlex)

An **MCP server for AI agents** — and a normal Apify scraper — that turns scholarly metadata into agent-callable tools. Point Claude, Cursor, or any MCP client at it and your agent can run a full literature review: search published papers and the latest preprints, look up a paper by DOI, and trace citations **both directions** — across **Crossref** (160M+ works), **OpenAlex** (240M+ works), and **arXiv**.

Built for the AI-agent era: clean JSON in, structured results out, no API keys to manage, no HTML scraping.

### Why this one

Most research tools on the store hit a single source and only go one direction. This one **aggregates three stable scholarly APIs** behind a consistent tool set and gives an agent the **forward citation graph** (who cites a paper), not just its bibliography — so it can ask "what's the newest work building on this?"

- **Zero selector rot.** Wraps first-party JSON/XML APIs, not scraped HTML — it stays working.
- **No keys, no logins.** Crossref (CC-BY), OpenAlex (CC0), and arXiv are public, reuse-friendly APIs.
- **Fair billing.** You're only charged when a tool returns a successful result.

### Tools (MCP mode)

| Tool | What it does |
|---|---|
| `search_papers` | Search peer-reviewed papers across **Crossref**, **OpenAlex**, or **all** (merged + de-duplicated, ranked by citations). |
| `search_preprints` | Search the latest **arXiv** preprints; optional category filter (`cs.AI`, `stat.ML`, …). |
| `get_paper_by_doi` | Full metadata for one paper by DOI. |
| `get_references` | The works a paper **cites** (its bibliography) — backward tracing. |
| `get_citations` | The works that **cite** a paper (forward citation graph), most-cited first. |
| `get_author_works` | An author's most-cited works + metrics (works count, total citations, h-index). |

### Connect (MCP client)

With the actor running in Standby mode, connect your MCP client to:

````

https://<your-username>--academic-research-mcp.apify.actor/mcp

````

Use the **streamable HTTP** transport with an `Authorization: Bearer <APIFY_TOKEN>` header:

```json
{
  "mcpServers": {
    "academic-research": {
      "url": "https://<your-username>--academic-research-mcp.apify.actor/mcp",
      "headers": { "Authorization": "Bearer <APIFY_TOKEN>" }
    }
  }
}
````

Then ask your agent things like *"Find the most-cited papers on retrieval-augmented generation since 2024, then show me what's cited the top one this year."*

### Use it as a normal scraper (no MCP)

You can also just **Run** it with input and get a dataset back:

```json
{ "operation": "search_papers", "query": "diffusion models", "source": "all", "fromYear": 2024, "maxResults": 15 }
```

Other operations: `search_preprints` (+ `category`), `get_paper_by_doi` / `get_references` / `get_citations` (+ `doi`), `get_author_works` (+ `author`).

### Pricing

Pay-per-event: **$0.004 per successful MCP tool call** (failed/empty calls are free), or **$0.00001 per result** in a normal dataset run. A typical 30-call literature-review session costs well under $0.20.

### Data sources & terms

- [Crossref REST API](https://www.crossref.org/documentation/retrieve-metadata/rest-api/) — public, no auth, metadata under CC-BY.
- [OpenAlex API](https://docs.openalex.org/) — public, no auth, CC0 data.
- [arXiv API](https://info.arxiv.org/help/api/) — public, no auth; results subject to arXiv's terms.

Public, logged-out data only. No login-gated content, no anti-bot circumvention. Not a medical/clinical data source.

### Local development

```bash
npm install
npm run build
## MCP / Standby mode:
APIFY_META_ORIGIN=STANDBY ACTOR_WEB_SERVER_PORT=8080 npm run dev
## Standard run (writes to local dataset):
node dist/main.js
```

***

*Built with Claude Code. Wraps public scholarly APIs; not affiliated with Crossref, OpenAlex, or arXiv.*

# Actor input Schema

## `operation` (type: `string`):

What to do in a standard run. (Ignored in Standby/MCP mode.)

## `query` (type: `string`):

Terms to search for (used by the search operations).

## `source` (type: `string`):

Which database to search.

## `doi` (type: `string`):

A DOI (for get\_paper\_by\_doi / get\_references / get\_citations), e.g. 10.1038/s41586-021-03819-2

## `author` (type: `string`):

Author full name (for get\_author\_works), e.g. Yoshua Bengio

## `category` (type: `string`):

Optional arXiv category filter for search\_preprints, e.g. cs.AI, cs.LG, stat.ML

## `fromYear` (type: `integer`):

Only return papers published in this year or later (search operations).

## `maxResults` (type: `integer`):

Maximum number of results (1–25).

## `politeMailto` (type: `string`):

Optional. Sent to Crossref/OpenAlex/arXiv as a courtesy for the faster 'polite pool'. Not stored beyond the upstream request.

## Actor input object example

```json
{
  "operation": "search_papers",
  "query": "large language models",
  "source": "crossref",
  "maxResults": 10
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "query": "large language models"
};

// Run the Actor and wait for it to finish
const run = await client.actor("saturday/academic-research-mcp").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "query": "large language models" }

# Run the Actor and wait for it to finish
run = client.actor("saturday/academic-research-mcp").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "query": "large language models"
}' |
apify call saturday/academic-research-mcp --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=saturday/academic-research-mcp",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Academic Research MCP — Papers, DOIs & Citations",
        "description": "MCP server + scraper for AI research agents: search 400M+ papers across Crossref, OpenAlex & arXiv, fetch DOIs, and trace both references and forward citations - plus author metrics. Built for literature review.",
        "version": "0.1",
        "x-build-id": "YqUovEXuo1GR65bRY"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/saturday~academic-research-mcp/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-saturday-academic-research-mcp",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/saturday~academic-research-mcp/runs": {
            "post": {
                "operationId": "runs-sync-saturday-academic-research-mcp",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/saturday~academic-research-mcp/run-sync": {
            "post": {
                "operationId": "run-sync-saturday-academic-research-mcp",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "operation": {
                        "title": "Operation",
                        "enum": [
                            "search_papers",
                            "search_preprints",
                            "get_paper_by_doi",
                            "get_references",
                            "get_citations",
                            "get_author_works"
                        ],
                        "type": "string",
                        "description": "What to do in a standard run. (Ignored in Standby/MCP mode.)",
                        "default": "search_papers"
                    },
                    "query": {
                        "title": "Search query",
                        "type": "string",
                        "description": "Terms to search for (used by the search operations)."
                    },
                    "source": {
                        "title": "Source (for Search papers)",
                        "enum": [
                            "crossref",
                            "openalex",
                            "all"
                        ],
                        "type": "string",
                        "description": "Which database to search.",
                        "default": "crossref"
                    },
                    "doi": {
                        "title": "DOI",
                        "type": "string",
                        "description": "A DOI (for get_paper_by_doi / get_references / get_citations), e.g. 10.1038/s41586-021-03819-2"
                    },
                    "author": {
                        "title": "Author name",
                        "type": "string",
                        "description": "Author full name (for get_author_works), e.g. Yoshua Bengio"
                    },
                    "category": {
                        "title": "arXiv category",
                        "type": "string",
                        "description": "Optional arXiv category filter for search_preprints, e.g. cs.AI, cs.LG, stat.ML"
                    },
                    "fromYear": {
                        "title": "From year",
                        "minimum": 1900,
                        "maximum": 2100,
                        "type": "integer",
                        "description": "Only return papers published in this year or later (search operations)."
                    },
                    "maxResults": {
                        "title": "Max results",
                        "minimum": 1,
                        "maximum": 25,
                        "type": "integer",
                        "description": "Maximum number of results (1–25).",
                        "default": 10
                    },
                    "politeMailto": {
                        "title": "Contact email (API polite pool)",
                        "type": "string",
                        "description": "Optional. Sent to Crossref/OpenAlex/arXiv as a courtesy for the faster 'polite pool'. Not stored beyond the upstream request."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
