# arXiv Scraper (`solidcode/arxiv-scraper`) Actor

\[💰 $2.5 / 1K] Search arXiv and extract paper metadata — titles, authors, abstracts, subject categories, DOIs, journal references, submission dates, and PDF links. Search by keyword, title, author, or category, or fetch specific papers by arXiv ID.

- **URL**: https://apify.com/solidcode/arxiv-scraper.md
- **Developed by:** [SolidCode](https://apify.com/solidcode) (community)
- **Categories:** Developer tools, AI, Automation
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $2.50 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## arXiv Scraper

Search arXiv at scale and pull clean, structured paper metadata — titles, full author lists with affiliations, abstracts, subject categories, DOIs, journal references, submission and revision dates, and direct PDF and abstract-page links. Search by keyword, by title, author, or abstract individually, by subject category, or fetch exact papers by arXiv ID. Built for researchers, data scientists, and librarians who need a ready-to-use arXiv dataset without manual copy-paste or wrestling with raw repository feeds one page at a time.

### Why This Scraper?

- **~40 subject categories across 8 disciplines** — pick from a labeled list spanning computer science, statistics, mathematics, physics, quantitative biology, quantitative finance, economics, and electrical engineering. Select cs.LG, stat.ML, math.PR, quant-ph and more with a checkbox — no codes to memorize.
- **Field-specific search, not just keywords** — match words in the title, the author name, or the abstract as separate inputs, then combine them. Find "transformer" in the title by Vaswani in the cs.CL category in one run.
- **Direct arXiv-ID lookup, including legacy IDs** — paste a list of IDs to fetch exact papers. Handles both modern (`2310.06825`) and legacy slash-style (`cond-mat/0011267`) identifiers, so decades-old preprints come back just as cleanly as last week's.
- **Full author affiliations** — every author arrives as a structured record with name and institutional affiliation when the paper lists one, ready for co-author and institution analysis.
- **DOI and journal reference for published-version cross-linking** — when authors register a DOI or cite the published venue, both fields land in the row, letting you join preprints to their peer-reviewed counterparts.
- **Direct PDF and abstract-page links on every paper** — a `pdfUrl` for the full text and an `absUrl` for the human-readable landing page, so downstream tools can fetch or link without rebuilding URLs.
- **Sort by relevance, submission date, or last-updated date** — newest-first or oldest-first, so you can surface the freshest preprints or build a chronological corpus.
- **Up to 50,000 papers per run** — set the result cap to zero to sweep an entire topic, with a built-in safety ceiling so a broad query never runs away.

### Use Cases

**Academic Literature & Systematic Review**
- Assemble a complete reading list for a topic, sorted by relevance or recency
- Narrow a survey to a single subject category to cut cross-field noise
- Pull every preprint by a specific author for a focused author study
- Track the latest submissions in a field by sorting on submission date

**Research Trend & Citation Analysis**
- Measure publication volume in an emerging sub-field over time
- Map which institutions are most active via author affiliations
- Detect bursts of activity by sweeping recent submissions in a category
- Build a chronological corpus to chart how terminology shifts year over year

**Competitive R&D Intelligence**
- Monitor what a competing lab or research group is publishing on a topic
- Benchmark output across institutions using affiliation data
- Spot new directions before they reach peer-reviewed journals
- Watch a category daily for the newest preprints in your space

**ML & AI Dataset Building**
- Harvest abstracts at scale to train or fine-tune domain models
- Build a labeled corpus by subject category for classification tasks
- Collect title-abstract pairs for summarization and retrieval datasets
- Gather a topic-specific text set for embeddings and semantic search

**Bibliographic Database Enrichment**
- Cross-reference preprints to published versions via DOI and journal reference
- Fill in missing abstracts, categories, and dates in an existing catalog
- Resolve legacy slash-style IDs to current metadata
- Enrich a reference manager export with affiliations and revision dates

**Grant & Patent Prior-Art Search**
- Surface the earliest preprints describing a technique for prior-art review
- Document the state of the art in a field for a grant proposal
- Trace an idea back to its first submission date on arXiv
- Compile a dated evidence trail across multiple subject categories

### Getting Started

#### Basic Keyword Search

The simplest possible run — one topic, 50 papers:

```json
{
    "searchQuery": "large language models",
    "maxResults": 50
}
````

#### Field-Specific Search by Category

Find recent computer-vision papers whose title mentions diffusion, newest first:

```json
{
    "title": "diffusion",
    "categories": ["cs.CV", "cs.LG"],
    "sortBy": "submittedDate",
    "sortOrder": "descending",
    "maxResults": 200
}
```

#### Fetch Specific Papers by ID

Pull exact papers — modern and legacy IDs together — ignoring all search fields:

```json
{
    "arxivIds": ["2310.06825", "1706.03762", "cond-mat/0011267"]
}
```

#### Author and Abstract Search Combined

Every author preprint mentioning reinforcement learning in the abstract:

```json
{
    "author": "Yann LeCun",
    "abstract": "reinforcement learning",
    "categories": ["cs.AI", "cs.LG", "stat.ML"],
    "sortBy": "lastUpdatedDate",
    "maxResults": 500
}
```

### Input Reference

#### Search

Combine any of these fields, or paste arXiv IDs to fetch exact papers.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `searchQuery` | string | `"large language models"` | Free-text search across the whole paper (title, abstract, authors). Advanced users can use field prefixes like `ti:`, `au:`, `abs:`, `cat:` and boolean operators. |
| `title` | string | null | Only include papers whose title contains these words. |
| `author` | string | null | Only include papers by this author (e.g. "Yann LeCun" or "Hinton"). |
| `abstract` | string | null | Only include papers whose abstract contains these words. |
| `categories` | array | `[]` | Restrict results to selected arXiv subject areas. Choose from ~40 labeled categories across 8 disciplines; leave empty to search all subjects. |
| `arxivIds` | array | `[]` | Fetch specific papers by arXiv ID (e.g. `2310.06825` or legacy `cond-mat/0011267`). When set, the search fields above are ignored. |

#### Results

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `maxResults` | integer | `50` | Maximum papers to return. Set to `0` to fetch all matches, with a safety cap of 50,000 so very broad searches don't run indefinitely. Ignored when fetching by ID. |
| `sortBy` | select | `Relevance` | Order results by Relevance, Submission date, or Last updated date. |
| `sortOrder` | select | `Newest first (descending)` | Newest first (descending) or Oldest first (ascending). Most useful when sorting by date. |

### Output

Each paper is one flat row in the dataset. Here is a representative result:

```json
{
    "arxivId": "1706.03762",
    "version": 7,
    "title": "Attention Is All You Need",
    "authors": [
        { "name": "Ashish Vaswani", "affiliation": "Google Brain" },
        { "name": "Noam Shazeer", "affiliation": "Google Brain" },
        { "name": "Niki Parmar", "affiliation": "Google Research" }
    ],
    "abstract": "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder...",
    "primaryCategory": "cs.CL",
    "categories": ["cs.CL", "cs.LG"],
    "publishedDate": "2017-06-12T17:57:34Z",
    "updatedDate": "2023-08-02T00:41:18Z",
    "doi": "10.48550/arXiv.1706.03762",
    "journalRef": "Advances in Neural Information Processing Systems 30 (2017)",
    "comments": "15 pages, 5 figures",
    "pdfUrl": "https://arxiv.org/pdf/1706.03762v7",
    "absUrl": "https://arxiv.org/abs/1706.03762v7"
}
```

#### Core Fields

| Field | Type | Description |
|-------|------|-------------|
| `title` | string | Paper title, whitespace-normalized |
| `authors` | object\[] | One record per author: `{ name, affiliation }` (affiliation included when the paper lists it) |
| `abstract` | string | Full abstract text |
| `primaryCategory` | string | Primary arXiv subject code (e.g. `cs.CL`) |
| `categories` | string\[] | All subject codes on the paper |
| `comments` | string|null | Author comments (e.g. "15 pages, 5 figures") |

#### Identifiers & Cross-References

| Field | Type | Description |
|-------|------|-------------|
| `arxivId` | string | arXiv identifier without version (e.g. `1706.03762`) |
| `version` | integer | Version number (`v7` → `7`) |
| `doi` | string|null | DOI when the authors registered one |
| `journalRef` | string|null | Journal reference / citation when the paper is published |

#### Dates & Links

| Field | Type | Description |
|-------|------|-------------|
| `publishedDate` | string | First-submitted timestamp (ISO 8601) |
| `updatedDate` | string | Last-updated timestamp (ISO 8601) |
| `pdfUrl` | string | Direct link to the full-text PDF |
| `absUrl` | string | Link to the arXiv abstract landing page |

### Tips for Best Results

- **Use field prefixes for precision.** In `searchQuery` you can write `ti:transformer` to match only titles or `cat:cs.CL` to scope a subject — power users can build advanced boolean queries like `ti:transformer AND abs:translation` in a single field.
- **Narrow by category to cut noise.** A broad term like "networks" spans biology, physics, and computer science. Selecting one or two subject categories sharpens results dramatically and lowers your result count.
- **Sort by submission date for the freshest preprints.** Set `sortBy` to Submission date with Newest first to surface the very latest work in a field — ideal for daily monitoring and trend tracking.
- **Fetch by ID when you know exactly what you want.** Pasting arXiv IDs is the fastest, most precise path — it skips search entirely and returns those exact papers, legacy slash-style IDs included.
- **Start small, then scale.** Run with `maxResults` of 25–50 to confirm the data matches your needs, then raise the cap or set it to `0` to sweep a whole topic.
- **Keep DOI and journal reference for cross-linking.** When present, these fields let you match a preprint to its peer-reviewed version — invaluable for bibliographic enrichment and citation work.
- **Combine title, author, and abstract for laser-focused queries.** The three field inputs are AND-joined, so a name in `author` plus a phrase in `abstract` returns only papers that satisfy both.

### Pricing

**From $2.50 per 1,000 results** — a flat per-result rate that undercuts comparable arXiv extractors, with no hidden surcharges. Bronze, Silver, and Gold subscribers pay progressively less; the table below shows total cost at each discount tier.

| Results | No discount | Bronze | Silver | Gold |
|---------|-------------|--------|--------|------|
| 100 | $0.30 | $0.28 | $0.265 | $0.25 |
| 1,000 | $3.00 | $2.80 | $2.65 | $2.50 |
| 10,000 | $30.00 | $28.00 | $26.50 | $25.00 |
| 100,000 | $300.00 | $280.00 | $265.00 | $250.00 |

A "result" is any paper row in the output dataset. No compute or time-based charges — you pay per result, plus a small fixed per-run start fee.

### Integrations

Export data in JSON, CSV, Excel, XML, or RSS. Connect to 1,500+ apps via:

- **Zapier** / **Make** / **n8n** — Workflow automation
- **Google Sheets** — Direct spreadsheet export
- **Slack** / **Email** — Notifications on new results
- **Webhooks** — Trigger custom workflows on run completion
- **Apify API** — Full programmatic access

### Legal & Ethical Use

arXiv content is openly accessible, and this actor is designed for legitimate academic research, literature review, bibliometrics, and dataset building. Each paper on arXiv is distributed under its own license chosen by the authors — respect those individual licenses when reusing abstracts or full text. Users are responsible for complying with applicable laws and arXiv's terms of use, including making reasonable-rate requests. Do not use extracted data for spam, harassment, or any illegal purpose.

# Actor input Schema

## `searchQuery` (type: `string`):

Free-text search across the whole paper (title, abstract, authors). Example: 'large language models' or 'quantum error correction'. Advanced users can use arXiv field prefixes like ti: (title), au: (author), abs: (abstract) or cat: (category) — e.g. 'ti:transformer'.

## `title` (type: `string`):

Only include papers whose title contains these words. Example: 'graph neural network'.

## `author` (type: `string`):

Only include papers by this author. Example: 'Yann LeCun' or 'Hinton'.

## `abstract` (type: `string`):

Only include papers whose abstract contains these words. Example: 'reinforcement learning'.

## `categories` (type: `array`):

Restrict results to these arXiv subject areas. Leave empty to search all subjects. Papers matching any selected category are included.

## `arxivIds` (type: `array`):

Fetch specific papers by their arXiv ID — e.g. '2310.06825' or the legacy form 'cond-mat/0011267'. When provided, these papers are fetched directly and the search fields above are ignored.

## `maxResults` (type: `integer`):

Maximum number of papers to return for your search. Set to 0 to fetch all available matches — in that case a safety cap of 50,000 papers is applied so very large searches (some have hundreds of thousands of matches) don't run indefinitely. Ignored when fetching by arXiv ID.

## `sortBy` (type: `string`):

How to order the results.

## `sortOrder` (type: `string`):

Order direction. Newest first is most useful when sorting by date.

## Actor input object example

```json
{
  "searchQuery": "large language models",
  "categories": [],
  "arxivIds": [],
  "maxResults": 50,
  "sortBy": "relevance",
  "sortOrder": "descending"
}
```

# Actor output Schema

## `overview` (type: `string`):

Table of papers with title, authors, primary category, submission date, and links.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "searchQuery": "large language models",
    "categories": [],
    "arxivIds": [],
    "maxResults": 50,
    "sortBy": "relevance",
    "sortOrder": "descending"
};

// Run the Actor and wait for it to finish
const run = await client.actor("solidcode/arxiv-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "searchQuery": "large language models",
    "categories": [],
    "arxivIds": [],
    "maxResults": 50,
    "sortBy": "relevance",
    "sortOrder": "descending",
}

# Run the Actor and wait for it to finish
run = client.actor("solidcode/arxiv-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "searchQuery": "large language models",
  "categories": [],
  "arxivIds": [],
  "maxResults": 50,
  "sortBy": "relevance",
  "sortOrder": "descending"
}' |
apify call solidcode/arxiv-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=solidcode/arxiv-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "arXiv Scraper",
        "description": "[💰 $2.5 / 1K] Search arXiv and extract paper metadata — titles, authors, abstracts, subject categories, DOIs, journal references, submission dates, and PDF links. Search by keyword, title, author, or category, or fetch specific papers by arXiv ID.",
        "version": "1.0",
        "x-build-id": "TTDRmi48kbrTiNvGZ"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/solidcode~arxiv-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-solidcode-arxiv-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/solidcode~arxiv-scraper/runs": {
            "post": {
                "operationId": "runs-sync-solidcode-arxiv-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/solidcode~arxiv-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-solidcode-arxiv-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "searchQuery": {
                        "title": "Search Query",
                        "type": "string",
                        "description": "Free-text search across the whole paper (title, abstract, authors). Example: 'large language models' or 'quantum error correction'. Advanced users can use arXiv field prefixes like ti: (title), au: (author), abs: (abstract) or cat: (category) — e.g. 'ti:transformer'."
                    },
                    "title": {
                        "title": "Title Contains",
                        "type": "string",
                        "description": "Only include papers whose title contains these words. Example: 'graph neural network'."
                    },
                    "author": {
                        "title": "Author",
                        "type": "string",
                        "description": "Only include papers by this author. Example: 'Yann LeCun' or 'Hinton'."
                    },
                    "abstract": {
                        "title": "Abstract Contains",
                        "type": "string",
                        "description": "Only include papers whose abstract contains these words. Example: 'reinforcement learning'."
                    },
                    "categories": {
                        "title": "Subject Categories",
                        "uniqueItems": true,
                        "type": "array",
                        "description": "Restrict results to these arXiv subject areas. Leave empty to search all subjects. Papers matching any selected category are included.",
                        "items": {
                            "type": "string",
                            "enum": [
                                "cs.AI",
                                "cs.CL",
                                "cs.CV",
                                "cs.LG",
                                "cs.NE",
                                "cs.RO",
                                "cs.CR",
                                "cs.DS",
                                "cs.DC",
                                "cs.HC",
                                "cs.IR",
                                "cs.SE",
                                "cs.SY",
                                "stat.ML",
                                "stat.ME",
                                "stat.AP",
                                "math.PR",
                                "math.ST",
                                "math.OC",
                                "math.NA",
                                "math.CO",
                                "math.AG",
                                "math.NT",
                                "physics.optics",
                                "physics.app-ph",
                                "physics.comp-ph",
                                "physics.med-ph",
                                "astro-ph",
                                "cond-mat",
                                "gr-qc",
                                "hep-ph",
                                "hep-th",
                                "nlin.CD",
                                "nucl-th",
                                "quant-ph",
                                "q-bio",
                                "q-fin.ST",
                                "econ.EM",
                                "eess.AS",
                                "eess.IV",
                                "eess.SP",
                                "eess.SY"
                            ],
                            "enumTitles": [
                                "CS — Artificial Intelligence (cs.AI)",
                                "CS — Computation & Language / NLP (cs.CL)",
                                "CS — Computer Vision (cs.CV)",
                                "CS — Machine Learning (cs.LG)",
                                "CS — Neural & Evolutionary Computing (cs.NE)",
                                "CS — Robotics (cs.RO)",
                                "CS — Cryptography & Security (cs.CR)",
                                "CS — Data Structures & Algorithms (cs.DS)",
                                "CS — Distributed & Parallel Computing (cs.DC)",
                                "CS — Human-Computer Interaction (cs.HC)",
                                "CS — Information Retrieval (cs.IR)",
                                "CS — Software Engineering (cs.SE)",
                                "CS — Systems & Control (cs.SY)",
                                "Statistics — Machine Learning (stat.ML)",
                                "Statistics — Methodology (stat.ME)",
                                "Statistics — Applications (stat.AP)",
                                "Math — Probability (math.PR)",
                                "Math — Statistics Theory (math.ST)",
                                "Math — Optimization & Control (math.OC)",
                                "Math — Numerical Analysis (math.NA)",
                                "Math — Combinatorics (math.CO)",
                                "Math — Algebraic Geometry (math.AG)",
                                "Math — Number Theory (math.NT)",
                                "Physics — Optics (physics.optics)",
                                "Physics — Applied Physics (physics.app-ph)",
                                "Physics — Computational Physics (physics.comp-ph)",
                                "Physics — Medical Physics (physics.med-ph)",
                                "Astrophysics (astro-ph)",
                                "Condensed Matter (cond-mat)",
                                "General Relativity & Quantum Cosmology (gr-qc)",
                                "High Energy Physics — Phenomenology (hep-ph)",
                                "High Energy Physics — Theory (hep-th)",
                                "Nonlinear Sciences — Chaotic Dynamics (nlin.CD)",
                                "Nuclear Theory (nucl-th)",
                                "Quantum Physics (quant-ph)",
                                "Quantitative Biology (q-bio)",
                                "Quantitative Finance — Statistical Finance (q-fin.ST)",
                                "Economics — Econometrics (econ.EM)",
                                "EE — Audio & Speech Processing (eess.AS)",
                                "EE — Image & Video Processing (eess.IV)",
                                "EE — Signal Processing (eess.SP)",
                                "EE — Systems & Control (eess.SY)"
                            ]
                        },
                        "default": []
                    },
                    "arxivIds": {
                        "title": "arXiv IDs",
                        "type": "array",
                        "description": "Fetch specific papers by their arXiv ID — e.g. '2310.06825' or the legacy form 'cond-mat/0011267'. When provided, these papers are fetched directly and the search fields above are ignored.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxResults": {
                        "title": "Maximum Results",
                        "minimum": 0,
                        "maximum": 50000,
                        "type": "integer",
                        "description": "Maximum number of papers to return for your search. Set to 0 to fetch all available matches — in that case a safety cap of 50,000 papers is applied so very large searches (some have hundreds of thousands of matches) don't run indefinitely. Ignored when fetching by arXiv ID.",
                        "default": 50
                    },
                    "sortBy": {
                        "title": "Sort By",
                        "enum": [
                            "relevance",
                            "submittedDate",
                            "lastUpdatedDate"
                        ],
                        "type": "string",
                        "description": "How to order the results.",
                        "default": "relevance"
                    },
                    "sortOrder": {
                        "title": "Sort Order",
                        "enum": [
                            "descending",
                            "ascending"
                        ],
                        "type": "string",
                        "description": "Order direction. Newest first is most useful when sorting by date.",
                        "default": "descending"
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
