# OpenAlex Scholarly Works Scraper (`dami_studio/openalex-scraper`) Actor

Searches OpenAlex (250M+ scholarly works) by keyword and returns structured records: title, authors, institutions, venue, year, citation count, concepts, open-access link, and the full reconstructed abstract for literature reviews.

- **URL**: https://apify.com/dami\_studio/openalex-scraper.md
- **Developed by:** [Dami's Studio](https://apify.com/dami_studio) (community)
- **Categories:** Developer tools, AI, Other
- **Stats:** 1 total users, 1 monthly users, 0.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $2.00 / 1,000 work returneds

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## OpenAlex Scholarly Works Scraper

Search the [OpenAlex](https://openalex.org) catalog of 250M+ scholarly works and get clean, structured records — no API key, no login. OpenAlex is a free, open index of scholarship (an open replacement for Microsoft Academic Graph / Scopus).

This actor calls the public OpenAlex `works` endpoint, walks results with **cursor pagination** (the reliable way past the first couple hundred), reconstructs each abstract from its inverted index into readable text, and returns one flat row per work.

It is a polite API citizen: every request carries a contact `mailto` (both as a query param and in the `User-Agent`), which routes traffic to OpenAlex's faster, more reliable "polite pool".

### Input

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `query` | string | — (required) | Keywords to search (title, abstract, fulltext), e.g. `machine learning`. |
| `sort` | string | `relevance` | `relevance`, `citations` (most cited first), or `date` (newest first). |
| `fromDate` | string | — | Optional `YYYY-MM-DD`; only works published on/after this date. |
| `filter` | string | — | Optional raw OpenAlex filter, e.g. `type:article,is_oa:true`. Merged with `fromDate`. |
| `maxItems` | integer | `100` | Max works to return (50 fetched per page via cursor). |
| `proxyConfiguration` | object | `{ "useApifyProxy": false }` | Optional. Not needed — OpenAlex is a clean public API. |

#### Example input

```json
{
  "query": "crispr",
  "sort": "citations",
  "fromDate": "2020-01-01",
  "maxItems": 120
}
````

### Output

One row per work:

```json
{
  "ok": true,
  "openalexId": "https://openalex.org/W...",
  "doi": "https://doi.org/10....",
  "title": "…",
  "authors": ["Jane Doe", "John Roe"],
  "institutions": ["Some University"],
  "year": 2021,
  "publicationDate": "2021-05-03",
  "type": "article",
  "venue": "Nature",
  "citations": 1234,
  "concepts": ["Biology", "Genetics"],
  "isOpenAccess": true,
  "oaUrl": "https://…pdf",
  "abstract": "Reconstructed abstract text…",
  "url": "https://openalex.org/W..."
}
```

`abstract` is rebuilt from OpenAlex's `abstract_inverted_index`; when no abstract is indexed it is `null`. Results are deduplicated by `openalexId`.

### Diagnostics & billing

On failure or no results, the actor pushes a single diagnostic row (`ok:false`) with an `errorCode` (`BAD_INPUT`, `NO_RESULTS`, `RATE_LIMITED`, `SERVER_ERROR`, `NETWORK`) instead of failing silently. **Only successful work rows are charged** (one `work` unit each) — diagnostics and empty results are never billed.

### Data source

Data comes from [OpenAlex](https://docs.openalex.org), released under CC0. Please cite OpenAlex when you use it.

# Actor input Schema

## `query` (type: `string`):

Keywords to search OpenAlex works for (title, abstract and fulltext are searched), e.g. "machine learning", "crispr gene editing". Required.

## `sort` (type: `string`):

How to order results: Relevance (best match for the query), Citations (most-cited first), or Date (newest first).

## `fromDate` (type: `string`):

Optional. Only return works published on or after this date (YYYY-MM-DD, e.g. 2023-01-01). Adds a from\_publication\_date filter.

## `filter` (type: `string`):

Optional advanced filter passed straight to the OpenAlex API filter param. Comma-separated key:value pairs, e.g. "type:article,is\_oa:true,from\_publication\_date:2023-01-01". See the OpenAlex docs for available filter keys. Merged with From publication date.

## `maxItems` (type: `integer`):

Maximum number of works to return. Cursor pagination fetches 50 per page until this many unique works are collected.

## `notionConnector` (type: `string`):

Optional. Write each result as a page into your Notion when the run finishes. Authorize a Notion connector once in Settings → API & Integrations → MCP connectors, then pick it here. Leave empty to skip (default) — results are always saved to the dataset regardless.

## `notionParentId` (type: `string`):

Optional. The Notion data source ID of the database to write into (only used if a Notion connector is set). Leave empty to create the pages privately in your workspace instead.

## `proxyConfiguration` (type: `object`):

OPTIONAL. The OpenAlex API is a public, no-auth JSON API with a generous polite pool (this actor sends a mailto and User-Agent to use it), so no proxy is needed and the default routes traffic directly, saving proxy credits. Only enable Apify Proxy if you hit IP rate limits at very high volume.

## Actor input object example

```json
{
  "query": "machine learning",
  "sort": "relevance",
  "maxItems": 100,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}
```

# Actor output Schema

## `results` (type: `string`):

Scraped rows are stored in the default dataset (one row per result). Blocked/empty/error runs return a single uncharged diagnostic row instead.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "query": "machine learning"
};

// Run the Actor and wait for it to finish
const run = await client.actor("dami_studio/openalex-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "query": "machine learning" }

# Run the Actor and wait for it to finish
run = client.actor("dami_studio/openalex-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "query": "machine learning"
}' |
apify call dami_studio/openalex-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=dami_studio/openalex-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "OpenAlex Scholarly Works Scraper",
        "description": "Searches OpenAlex (250M+ scholarly works) by keyword and returns structured records: title, authors, institutions, venue, year, citation count, concepts, open-access link, and the full reconstructed abstract for literature reviews.",
        "version": "0.1",
        "x-build-id": "lIcViVNfCFffQHM0e"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/dami_studio~openalex-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-dami_studio-openalex-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/dami_studio~openalex-scraper/runs": {
            "post": {
                "operationId": "runs-sync-dami_studio-openalex-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/dami_studio~openalex-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-dami_studio-openalex-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "query": {
                        "title": "Search query",
                        "type": "string",
                        "description": "Keywords to search OpenAlex works for (title, abstract and fulltext are searched), e.g. \"machine learning\", \"crispr gene editing\". Required."
                    },
                    "sort": {
                        "title": "Sort by",
                        "enum": [
                            "relevance",
                            "citations",
                            "date"
                        ],
                        "type": "string",
                        "description": "How to order results: Relevance (best match for the query), Citations (most-cited first), or Date (newest first).",
                        "default": "relevance"
                    },
                    "fromDate": {
                        "title": "From publication date",
                        "type": "string",
                        "description": "Optional. Only return works published on or after this date (YYYY-MM-DD, e.g. 2023-01-01). Adds a from_publication_date filter."
                    },
                    "filter": {
                        "title": "Raw OpenAlex filter (advanced)",
                        "type": "string",
                        "description": "Optional advanced filter passed straight to the OpenAlex API filter param. Comma-separated key:value pairs, e.g. \"type:article,is_oa:true,from_publication_date:2023-01-01\". See the OpenAlex docs for available filter keys. Merged with From publication date."
                    },
                    "maxItems": {
                        "title": "Max works",
                        "minimum": 1,
                        "maximum": 10000,
                        "type": "integer",
                        "description": "Maximum number of works to return. Cursor pagination fetches 50 per page until this many unique works are collected.",
                        "default": 100
                    },
                    "notionConnector": {
                        "title": "Notion connector (optional)",
                        "type": "string",
                        "description": "Optional. Write each result as a page into your Notion when the run finishes. Authorize a Notion connector once in Settings → API & Integrations → MCP connectors, then pick it here. Leave empty to skip (default) — results are always saved to the dataset regardless."
                    },
                    "notionParentId": {
                        "title": "Notion target data source ID",
                        "type": "string",
                        "description": "Optional. The Notion data source ID of the database to write into (only used if a Notion connector is set). Leave empty to create the pages privately in your workspace instead."
                    },
                    "proxyConfiguration": {
                        "title": "Proxy (optional)",
                        "type": "object",
                        "description": "OPTIONAL. The OpenAlex API is a public, no-auth JSON API with a generous polite pool (this actor sends a mailto and User-Agent to use it), so no proxy is needed and the default routes traffic directly, saving proxy credits. Only enable Apify Proxy if you hit IP rate limits at very high volume.",
                        "default": {
                            "useApifyProxy": false
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
