# Zenodo Scraper (`crawlerbros/zenodo-scraper`) Actor

Scrape Zenodo, CERN's open science repository with 3M+ research records including papers, datasets, software, posters, and presentations. Search by query, resource type, access rights, or fetch by record ID, DOI, or community.

- **URL**: https://apify.com/crawlerbros/zenodo-scraper.md
- **Developed by:** [Crawler Bros](https://apify.com/crawlerbros) (community)
- **Categories:** Developer tools, Automation, Other
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 7 bookmarks
- **User rating**: 5.00 out of 5 stars

## Pricing

from $3.00 / 1,000 results

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Zenodo Scraper

Extract research records from **Zenodo** — CERN's open-access repository with **3M+ scholarly outputs** including publications, datasets, software, posters, presentations, and more. No API key required.

### What Does Zenodo Scraper Do?

Zenodo Scraper gives you structured access to the Zenodo research repository via its public REST API. You can:

- **Search** the full Zenodo catalog by any keyword or phrase
- **Filter** by resource type (dataset, paper, software, etc.), access rights, date range, and sort order
- **Fetch** a specific record by its Zenodo ID or DOI
- **Browse** all records belonging to a Zenodo community

Each record includes the title, authors, description, keywords, resource type, access rights, license, publication date, community membership, file counts, and the direct Zenodo URL.

---

### Output Fields

| Field | Type | Description |
|---|---|---|
| `zenodoId` | Integer | Zenodo numeric record identifier |
| `doi` | String | Digital Object Identifier (DOI) |
| `title` | String | Record title |
| `creators` | Array | List of creators with `name`, `affiliation`, `orcid` |
| `description` | String | Abstract or description (HTML stripped) |
| `keywords` | Array | Author-supplied keywords |
| `resourceType` | String | Type of resource (publication, dataset, software, etc.) |
| `subtype` | String | Sub-type (e.g. article, preprint, figure) |
| `accessRight` | String | Access level (open, closed, embargoed, restricted) |
| `license` | String | License identifier (e.g. cc-by-4.0) |
| `publicationDate` | String | Publication date (YYYY-MM-DD) |
| `communities` | Array | Zenodo community IDs this record belongs to |
| `fileCount` | Integer | Number of attached files |
| `totalFileSizeBytes` | Integer | Total size of all attached files in bytes |
| `zenodoUrl` | String | Direct URL to the record on zenodo.org |
| `scrapedAt` | String | UTC timestamp when the record was scraped |

---

### Input Configuration

#### Modes

| Mode | Description |
|---|---|
| `search` | Full-text search across all Zenodo records |
| `byRecordId` | Fetch one specific record by its Zenodo numeric ID |
| `byCommunity` | Browse all records within a specific Zenodo community |
| `byDOI` | Find a record by its DOI string |

#### Input Parameters

| Parameter | Type | Default | Description |
|---|---|---|---|
| `mode` | Enum | `search` | Operating mode |
| `query` | String | — | Search query (mode=search) |
| `recordId` | String | — | Zenodo record ID (mode=byRecordId) |
| `communityId` | String | — | Community slug (mode=byCommunity) |
| `doi` | String | — | Full DOI (mode=byDOI) |
| `resourceType` | Enum | Any | Filter by type (publication, dataset, software, etc.) |
| `accessRight` | Enum | Any | Filter by access (open, closed, embargoed, restricted) |
| `sortBy` | Enum | `bestmatch` | Sort order for results |
| `fromDate` | String | — | Filter by publication date ≥ (YYYY-MM-DD) |
| `toDate` | String | — | Filter by publication date ≤ (YYYY-MM-DD) |
| `maxItems` | Integer | 50 | Maximum number of records to return (1–1000) |

---

### Example Inputs

#### Search for climate change datasets
```json
{
  "mode": "search",
  "query": "climate change",
  "resourceType": "dataset",
  "accessRight": "open",
  "maxItems": 100
}
````

#### Fetch a specific record by ID

```json
{
  "mode": "byRecordId",
  "recordId": "10234567"
}
```

#### Browse a community

```json
{
  "mode": "byCommunity",
  "communityId": "zenodo",
  "maxItems": 50
}
```

#### Find by DOI

```json
{
  "mode": "byDOI",
  "doi": "10.5281/zenodo.10234567"
}
```

#### Search within a date range

```json
{
  "mode": "search",
  "query": "machine learning",
  "fromDate": "2023-01-01",
  "toDate": "2024-12-31",
  "sortBy": "mostrecent",
  "maxItems": 200
}
```

***

### Use Cases

- **Literature reviews**: Collect papers and preprints across any research domain
- **Dataset discovery**: Find open datasets for any scientific field
- **Software citation**: Locate software deposits and their metadata
- **Community monitoring**: Track new uploads to specific Zenodo communities
- **Open science analytics**: Analyze publication trends, license adoption, and access patterns
- **Research metadata enrichment**: Enrich reference lists with full Zenodo metadata

***

### Frequently Asked Questions

**Do I need a Zenodo account or API key?**
No. Zenodo's public API is free and requires no authentication.

**How many records can I scrape?**
Up to 1,000 records per run. For larger datasets, Zenodo provides OAI-PMH and full data dumps at zenodo.org/oai2d.

**What does the `open` access right mean?**
Open access records have files freely available to download. Closed and restricted records may only return metadata.

**Can I search by author name?**
Yes — use the query field with Elasticsearch syntax: `author:"Smith, John"` or `creators.name:Smith`.

**Are file download URLs included?**
Not directly. Use the `zenodoId` to construct the API URL: `https://zenodo.org/api/records/{zenodoId}/files`.

**How fresh is the data?**
Zenodo's API serves live data. Records scraped reflect the current state of the repository.

**What communities are available?**
Visit zenodo.org/communities to browse thousands of communities across all scientific disciplines.

# Actor input Schema

## `mode` (type: `string`):

What to fetch from Zenodo.

## `query` (type: `string`):

Full-text search query (mode=search). Supports Elasticsearch query syntax, e.g. `climate change`, `machine learning`, `author:Einstein`.

## `recordId` (type: `string`):

Zenodo numeric record ID, e.g. `10234567`. (mode=byRecordId)

## `communityId` (type: `string`):

Zenodo community slug/identifier, e.g. `zenodo`, `ecfunded`, `astronomy-general`. (mode=byCommunity)

## `doi` (type: `string`):

Full DOI string, e.g. `10.5281/zenodo.10234567`. (mode=byDOI)

## `resourceType` (type: `string`):

Filter by resource type (mode=search and mode=byCommunity).

## `accessRight` (type: `string`):

Filter by access rights (mode=search).

## `sortBy` (type: `string`):

Sort order for search results (mode=search).

## `fromDate` (type: `string`):

Filter records published on or after this date (YYYY-MM-DD). (mode=search)

## `toDate` (type: `string`):

Filter records published on or before this date (YYYY-MM-DD). (mode=search)

## `maxItems` (type: `integer`):

Maximum number of records to return.

## Actor input object example

```json
{
  "mode": "search",
  "query": "climate change",
  "recordId": "3678535",
  "communityId": "astronomy-general",
  "doi": "10.5281/zenodo.3678535",
  "resourceType": "",
  "accessRight": "",
  "sortBy": "bestmatch",
  "maxItems": 5
}
```

# Actor output Schema

## `records` (type: `string`):

Dataset containing all scraped Zenodo research records.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "mode": "search",
    "query": "climate change",
    "recordId": "3678535",
    "communityId": "astronomy-general",
    "doi": "10.5281/zenodo.3678535",
    "resourceType": "",
    "accessRight": "",
    "sortBy": "bestmatch",
    "maxItems": 5
};

// Run the Actor and wait for it to finish
const run = await client.actor("crawlerbros/zenodo-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "mode": "search",
    "query": "climate change",
    "recordId": "3678535",
    "communityId": "astronomy-general",
    "doi": "10.5281/zenodo.3678535",
    "resourceType": "",
    "accessRight": "",
    "sortBy": "bestmatch",
    "maxItems": 5,
}

# Run the Actor and wait for it to finish
run = client.actor("crawlerbros/zenodo-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "mode": "search",
  "query": "climate change",
  "recordId": "3678535",
  "communityId": "astronomy-general",
  "doi": "10.5281/zenodo.3678535",
  "resourceType": "",
  "accessRight": "",
  "sortBy": "bestmatch",
  "maxItems": 5
}' |
apify call crawlerbros/zenodo-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=crawlerbros/zenodo-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Zenodo Scraper",
        "description": "Scrape Zenodo, CERN's open science repository with 3M+ research records including papers, datasets, software, posters, and presentations. Search by query, resource type, access rights, or fetch by record ID, DOI, or community.",
        "version": "1.0",
        "x-build-id": "dKlgXXCAGh5G82oPZ"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/crawlerbros~zenodo-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-crawlerbros-zenodo-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/crawlerbros~zenodo-scraper/runs": {
            "post": {
                "operationId": "runs-sync-crawlerbros-zenodo-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/crawlerbros~zenodo-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-crawlerbros-zenodo-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "mode"
                ],
                "properties": {
                    "mode": {
                        "title": "Mode",
                        "enum": [
                            "search",
                            "byRecordId",
                            "byCommunity",
                            "byDOI"
                        ],
                        "type": "string",
                        "description": "What to fetch from Zenodo.",
                        "default": "search"
                    },
                    "query": {
                        "title": "Search query",
                        "type": "string",
                        "description": "Full-text search query (mode=search). Supports Elasticsearch query syntax, e.g. `climate change`, `machine learning`, `author:Einstein`.",
                        "default": "climate change"
                    },
                    "recordId": {
                        "title": "Record ID (mode=byRecordId)",
                        "type": "string",
                        "description": "Zenodo numeric record ID, e.g. `10234567`. (mode=byRecordId)"
                    },
                    "communityId": {
                        "title": "Community ID (mode=byCommunity)",
                        "type": "string",
                        "description": "Zenodo community slug/identifier, e.g. `zenodo`, `ecfunded`, `astronomy-general`. (mode=byCommunity)"
                    },
                    "doi": {
                        "title": "DOI (mode=byDOI)",
                        "type": "string",
                        "description": "Full DOI string, e.g. `10.5281/zenodo.10234567`. (mode=byDOI)"
                    },
                    "resourceType": {
                        "title": "Resource type",
                        "enum": [
                            "",
                            "publication",
                            "dataset",
                            "software",
                            "image",
                            "video",
                            "presentation",
                            "poster",
                            "lesson",
                            "physicalobject",
                            "other"
                        ],
                        "type": "string",
                        "description": "Filter by resource type (mode=search and mode=byCommunity).",
                        "default": ""
                    },
                    "accessRight": {
                        "title": "Access right",
                        "enum": [
                            "",
                            "open",
                            "closed",
                            "embargoed",
                            "restricted"
                        ],
                        "type": "string",
                        "description": "Filter by access rights (mode=search).",
                        "default": ""
                    },
                    "sortBy": {
                        "title": "Sort by",
                        "enum": [
                            "bestmatch",
                            "mostrecent",
                            "mostviewed",
                            "mostdownloaded"
                        ],
                        "type": "string",
                        "description": "Sort order for search results (mode=search).",
                        "default": "bestmatch"
                    },
                    "fromDate": {
                        "title": "From date",
                        "type": "string",
                        "description": "Filter records published on or after this date (YYYY-MM-DD). (mode=search)"
                    },
                    "toDate": {
                        "title": "To date",
                        "type": "string",
                        "description": "Filter records published on or before this date (YYYY-MM-DD). (mode=search)"
                    },
                    "maxItems": {
                        "title": "Max items",
                        "minimum": 1,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Maximum number of records to return.",
                        "default": 50
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
