# 📄 arXiv Papers Monitor (`skootle/arxiv-papers`) Actor

Pull new AI / ML / CS / physics / math papers from arXiv as they land via the official arXiv API. Title, abstract, authors, PDF link, DOI, and LLM-ready summary card per paper. For ML researchers, AI agents, and journalists. Export, run via API, schedule, or integrate with other tools.

- **URL**: https://apify.com/skootle/arxiv-papers.md
- **Developed by:** [Skootle](https://apify.com/skootle) (community)
- **Categories:** AI, Developer tools, News
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $3.50 / 1,000 arxiv paper saveds

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

![arXiv Papers Monitor hero](https://raw.githubusercontent.com/kesjam/skootle-actors-assets/main/heroes/arxiv-papers.png)

### TL;DR

AI engineers and ML researchers waste 30+ minutes a day refreshing arxiv.org/list/cs.AI/recent and copy-pasting abstracts into spreadsheets. This delivers a clean daily diff of new arXiv papers in your tracked categories (cs.AI, cs.LG, cs.CL, cs.CV, stat.ML, math.OC, q-bio, more), deduplicated by `arxivId`, with full abstract, every author, PDF URL, DOI, and an LLM-ready markdown card per record. Watchlist mode emits only papers new since the last run, so a daily schedule feeds a RAG pipeline, vector DB, weekly research email, or Slack digest with zero duplicates and ISO 8601 timestamps your downstream sort logic can trust.

<!-- skootle:review-cta -->
> Try it on a small dataset (the 10-paper default fits the free $5 trial credit), then let us know what you think in a [review](https://apify.com/skootle/arxiv-papers/reviews).

---

### What does arXiv Papers Monitor do?

It calls the public arXiv API on your behalf and turns the raw Atom feed into clean JSON your code can use immediately. Each paper record includes:

- `arxivId` (e.g. `2304.12345`) and the version-aware `arxivIdVersion` (`2304.12345v2`)
- Full `title` and full `abstract`
- Every `authors[].name` and (when arXiv provides it) `authors[].affiliation`
- `primaryCategory` plus the full `categories[]` list, with an `isCrossListed` flag
- `submittedDate` and `updatedDate` in ISO 8601
- Direct `pdfUrl` and `abstractPageUrl`
- `doi`, `journalRef`, and the author's own `comment` (often "NeurIPS 2026 spotlight" or page count)
- `agentMarkdown`: a 5-line markdown card formatted for Claude / Codex / Slack / a CRM ticket

One API call replaces the manual workflow of opening arxiv.org, choosing a category, paging through 50 abstracts at a time, copy-pasting fields into a spreadsheet, and chasing PDF links. We collapse that to a JSON dataset you can pipe into a vector DB, an LLM agent, an alerting system, or a research dashboard.

### Why scrape arXiv?

arXiv is where every AI, ML, vision, and NLP paper lands first, often weeks or months before peer review. If your job is "what was published this week in X," refreshing arxiv.org/list/cs.AI/recent and copy-pasting abstracts into a spreadsheet eats 30+ minutes a day.

Feed a RAG pipeline, drive a weekly research newsletter, watch a specific lab or topic, or build training corpora, all from one daily diff. The buyers here are AI engineers wiring research retrieval, ML researchers tracking sub-fields, and editors of weekly AI newsletters who need a clean "what's new since yesterday" feed.

### Who needs this?

- **AI agent builders** wiring research-paper retrieval into RAG pipelines and need clean text plus PDF URLs without writing an Atom parser
- **ML researchers** tracking three or four sub-fields and wanting a daily digest of new submissions in their categories
- **AI journalists** chasing weekly stories who need to spot trending architectures, models, and lab outputs as they appear
- **M&A and corp-dev analysts** profiling AI startups by tracking which authors and labs are publishing what
- **Recruiters** sourcing ML talent by pulling first-author lists from hot subfields (RLHF, MoE, agents, vision-language)
- **Data scientists at LLM labs** building reproduction pipelines who need full abstracts and DOIs, not titles
- **Conference reviewers and editors** who want a structured, per-category submission feed for trend analysis

If your job involves "what was published on arXiv this week in X," you are the buyer.

### How to use arXiv Papers Monitor

1. Open the actor in Apify Console.
2. Pick your `categories` (e.g. `["cs.AI","cs.CL"]`) or type a `query` ("retrieval augmented generation").
3. Optionally set `submittedAfter` to limit to recent papers, or flip `watchlistMode` on for a daily-new feed.
4. Click **Start**. The default (`maxItems: 10`) returns about 30 seconds of work.
5. Download the dataset as JSON, CSV, or Excel, or pull it via the API at `https://api.apify.com/v2/acts/skootle~arxiv-papers/runs/last/dataset/items`.

### How much will scraping arXiv cost?

Pay-per-result pricing. You only pay for papers actually saved, plus a one-time start fee per run.

| Plan | Per paper | Run start |
|---|---|---|
| FREE | $0.005 | $0.005 |
| BRONZE | $0.0045 | $0.005 |
| SILVER | $0.004 | $0.005 |
| GOLD | $0.0035 | $0.005 |
| PLATINUM | $0.003 | $0.005 |
| DIAMOND | $0.003 | $0.005 |

Typical daily watchlist run for one researcher (50 new papers across cs.AI + cs.CL): about $0.26 on FREE, $0.16 on PLATINUM. A weekly bulk pull of 1000 papers is about $5 on FREE, $3 on PLATINUM. The $5 free Apify credit covers roughly 1000 records on the FREE tier.

### Is it legal to scrape arXiv?

arXiv runs an [official, public, unauthenticated query API](https://info.arxiv.org/help/api/user-manual.html) explicitly intended for programmatic access. We honor their published rate limit (1 request per 3 seconds) and identify ourselves with a descriptive User-Agent header. arXiv's Terms of Use cover non-commercial use directly; for commercial redistribution of paper content, follow up with arXiv directly and consult your own counsel.

This actor pulls only the metadata + abstract that arXiv exposes through the public API. It does not download PDFs, does not bypass any auth, and does not touch withdrawn papers.

### Examples

#### 1. Daily new cs.AI papers

```json
{
  "categories": ["cs.AI"],
  "sortBy": "submittedDate",
  "sortOrder": "descending",
  "maxItems": 50,
  "watchlistMode": true
}
````

Schedule daily, point the dataset webhook at Slack or a vector DB.

#### 2. RAG-themed papers from the last 30 days

```json
{
  "query": "retrieval augmented generation",
  "submittedAfter": "2026-04-09",
  "submittedBefore": "2026-05-09",
  "maxItems": 200
}
```

#### 3. NLP + ML cross-listed papers

```json
{
  "categories": ["cs.CL", "cs.LG"],
  "sortBy": "submittedDate",
  "maxItems": 100
}
```

#### 4. Specific lab tracking via author keyword in title

```json
{
  "query": "DeepMind OR Anthropic",
  "categories": ["cs.AI", "cs.LG"],
  "maxItems": 100
}
```

#### 5. Diffusion-model survey

```json
{
  "query": "diffusion model",
  "sortBy": "relevance",
  "maxItems": 100
}
```

#### 6. Math optimization for ML

```json
{
  "categories": ["math.OC", "stat.ML"],
  "submittedAfter": "2026-01-01",
  "maxItems": 200
}
```

#### 7. Computational neuroscience

```json
{
  "categories": ["q-bio.NC", "cs.NE"],
  "maxItems": 50
}
```

#### 8. Title-only feed for fast indexing

```json
{
  "categories": ["cs.CV"],
  "includeAbstract": false,
  "maxItems": 1000
}
```

### Input parameters

| Field | Type | Description |
|---|---|---|
| `query` | string | Free-text search across title + abstract |
| `categories` | string\[] | arXiv category codes (cs.AI, cs.LG, cs.CL, cs.CV, stat.ML, math.OC, physics.\*, q-bio.\*, more) |
| `submittedAfter` | string (ISO date) | Earliest submission date |
| `submittedBefore` | string (ISO date) | Latest submission date |
| `sortBy` | enum | submittedDate, lastUpdatedDate, or relevance |
| `sortOrder` | enum | descending or ascending |
| `maxItems` | int | Max papers per run (default 10, max 2000) |
| `includeAbstract` | bool | Toggle full abstract vs title-only (default true) |
| `watchlistMode` | bool | Emit only new papers since the last run |
| `proxyConfiguration` | object | Optional residential proxy for very large bulk runs |

### arXiv output format

#### `arxiv_paper` record

| Field | Type | Notes |
|---|---|---|
| `recordType` | string | Always `"arxiv_paper"` |
| `outputSchemaVersion` | string | `"2026-05-10"`. Bumps on schema change. |
| `arxivId` | string | `"2304.12345"` (no version) |
| `arxivIdVersion` | string | `"2304.12345v2"` |
| `doi` | string | null | DOI when assigned |
| `title` | string | Full title |
| `abstract` | string | Full abstract, whitespace-normalized |
| `authors` | object\[] | `{ name, affiliation }` per author |
| `authorCount` | int | Length of `authors` |
| `primaryCategory` | string | e.g. `"cs.AI"` |
| `categories` | string\[] | All assigned categories |
| `submittedDate` | string | ISO 8601 |
| `updatedDate` | string | ISO 8601 |
| `pdfUrl` | string | Direct PDF URL |
| `abstractPageUrl` | string | arxiv.org abs page |
| `journalRef` | string | null | "Nature 612, 2026" style reference if accepted |
| `comment` | string | null | Author note ("NeurIPS 2026 spotlight", page count, etc.) |
| `estimatedReadMinutes` | int | Abstract word count / 200 |
| `isCrossListed` | bool | True when `categories.length > 1` |
| `agentMarkdown` | string | LLM-ready 5-line card |
| `fieldCompletenessScore` | int | 0-100, 10 fields evaluated |
| `scrapedAt` | string | ISO 8601 |

#### Sample record

```json
{
  "recordType": "arxiv_paper",
  "outputSchemaVersion": "2026-05-10",
  "arxivId": "2605.06667",
  "arxivIdVersion": "2605.06667v1",
  "doi": null,
  "title": "ActCam: Zero-Shot Joint Camera and 3D Motion Control for Video Generation",
  "abstract": "For artistic applications, video generation requires fine-grained control...",
  "authors": [
    { "name": "Omar El Khalifi", "affiliation": null },
    { "name": "Thomas Rossi", "affiliation": null }
  ],
  "authorCount": 9,
  "primaryCategory": "cs.CV",
  "categories": ["cs.CV", "cs.AI", "cs.LG"],
  "submittedDate": "2026-05-07T17:59:58Z",
  "updatedDate": "2026-05-07T17:59:58Z",
  "pdfUrl": "https://arxiv.org/pdf/2605.06667v1",
  "abstractPageUrl": "https://arxiv.org/abs/2605.06667v1",
  "journalRef": null,
  "comment": "SIGGRAPH 2026",
  "estimatedReadMinutes": 2,
  "isCrossListed": true,
  "agentMarkdown": "📄 ActCam: Zero-Shot Joint Camera and 3D Motion Control for Video Generation (2605.06667)\n👥 Omar El Khalifi + 8 more\n📅 Submitted 2026-05-07 · Category cs.CV\n📊 2 min read · Cross-listed\n🔗 https://arxiv.org/pdf/2605.06667v1",
  "fieldCompletenessScore": 80,
  "scrapedAt": "2026-05-09T20:50:00Z"
}
```

### During the actor run

No authentication required. The actor honors arXiv's published 1-request-per-3-seconds rate limit and identifies itself with a descriptive User-Agent, so the source stays available for everyone. A 1000-paper pull typically completes in about 30 seconds.

A run summary lands at the `OUTPUT` key, a markdown digest of the top 5 papers at `AGENT_BRIEFING`, and (with `watchlistMode: true`) the rolling 50,000-id dedupe window at `WATCHLIST_STATE`.

### FAQ

#### How is this different from arXiv's free API?

The free API returns raw Atom XML with namespaced tags. You write an XML parser, you write a paginator that respects the 3-second rate limit, you write a normalizer for affiliation / journal\_ref / comment fields, and you write the watchlist diff yourself. Then you maintain it. We give you typed JSON, idempotent IDs, watchlist mode, an agent-ready markdown card per record, and a versioned schema so your downstream pipeline does not silently break.

#### What about HuggingFace papers or PapersWithCode?

Different sources, different scope. HuggingFace Papers is curated and lags arXiv. PapersWithCode focuses on code-attached papers. Use this actor for the firehose, then enrich with HF / PWC if you need code-availability signals. We will likely ship companion actors for both in v0.2.

#### Can I track only papers from specific universities or labs?

Indirectly. arXiv's API does not expose a clean "affiliation" filter, but you can `query` by lab keywords ("Anthropic", "DeepMind", "Stanford NLP") and the term will match titles and abstracts. For author-list filtering, post-process the dataset (`authors[].name`) downstream.

#### How does watchlist mode work?

Flip `watchlistMode: true`. The actor reads `WATCHLIST_STATE` from the key-value store, runs the search, and emits only papers whose `arxivId` it has not delivered before. After each run it appends the newly seen IDs back to state (rolling window of 50,000). Pair with a daily Apify schedule for a clean "what's new" feed.

#### Can I use this with Python?

Yes. `pip install apify-client`, call `client.actor("skootle/arxiv-papers").call(run_input=...)`, then iterate `client.dataset(run["defaultDatasetId"]).iterate_items()`.

#### Can I integrate with Make / Zapier / n8n / Slack?

Yes. Apify exposes webhook triggers on dataset items and run completion. n8n and Make have native Apify connectors; Zapier works through the standard webhook bridge.

#### Why does this cost more than free arXiv scrapers?

If you are wiring this into a customer-facing product or a daily AI-agent pipeline, the per-record cost ($0.003 at GOLD) buys you reliability free actors do not provide: versioned schema, idempotent IDs, watchlist diff, daily Apify auto-test reliability, and a maintenance commitment. Free actors break monthly when the source changes a tag name, you do not get notified, and your pipeline silently goes empty.

#### What rate limits should I worry about?

arXiv asks for at most 1 request per 3 seconds. We honor that automatically. With 100 papers per page, a 1000-paper pull takes roughly 30 seconds plus arXiv processing time.

#### Does this download the full PDF?

No, only metadata and abstract. The `pdfUrl` field gives you the direct PDF link if your downstream needs the full text.

### Why choose arXiv Papers Monitor

- **Monitor mode emits only what's new since last run**. A rolling 50,000-id window means your RAG pipeline ingests each paper exactly once.
- **Reliability free actors can't deliver**. Free arXiv scrapers break monthly when source tags change. You don't get notified, your pipeline silently goes empty. The per-record cost ($0.003 at GOLD) buys daily auto-test reliability and 24-48 hour fix turnaround.
- **Sub-minute runtime, no rate-limit babysitting**. Pure HTTP against the official arXiv API, no HTML parsing, no headless browser, 1000 papers in about 30 seconds.
- **Drop-in for LLM agents**. `agentMarkdown` card baked into every record, plus a per-run `AGENT_BRIEFING.md` digest of the top 5 papers ready for Slack or a daily LLM context window.
- **Schema doesn't break your pipeline**, versioned and bumped on every breaking change.
- **Re-runs are safe to dedupe by ID**, `arxivId`-keyed records upsert cleanly across runs.
- **AI agents can self-filter sparse rows** via `fieldCompletenessScore` (0-100, 10 fields evaluated).

#### Your feedback

Hit a bug or want a feature? Open an issue on the [Issues tab](https://apify.com/skootle/arxiv-papers/issues/open) rather than the reviews page, and we will fix it fast (typically within 48 hours).

### Other Skootle actors you might want to check

- [skootle/hackernews-watchlist](https://apify.com/skootle/hackernews-watchlist), watchlist new HN stories matching keywords or domains
- [skootle/github-trending](https://apify.com/skootle/github-trending), daily trending repos by language with stargazer + commit signals
- [skootle/reddit-subreddit-monitor](https://apify.com/skootle/reddit-subreddit-monitor), new posts in any subreddit with watchlist diff
- [skootle/sec-edgar-filings](https://apify.com/skootle/sec-edgar-filings), public SEC filings normalized for AI agents

### Support and contact

Found a bug or need a new field? Open an [issue](https://apify.com/skootle/arxiv-papers/issues/open). For commercial use questions, email jamie.kester@gmail.com.

# Actor input Schema

## `query` (type: `string`):

Optional. Free-text term to search across paper titles and abstracts, e.g. 'transformer attention' or 'diffusion model'. Leave blank to pull every paper in the chosen category.

## `categories` (type: `array`):

Optional list of arXiv categories. Examples: cs.AI (artificial intelligence), cs.LG (machine learning), cs.CL (NLP), cs.CV (computer vision), stat.ML, math.OC, physics.med-ph, q-bio.NC. Multiple categories are OR-joined.

## `submittedAfter` (type: `string`):

Optional ISO date (YYYY-MM-DD). Only return papers submitted on or after this date.

## `submittedBefore` (type: `string`):

Optional ISO date (YYYY-MM-DD). Only return papers submitted on or before this date.

## `sortBy` (type: `string`):

How arXiv orders the results.

## `sortOrder` (type: `string`):

Direction of the sort.

## `maxItems` (type: `integer`):

Maximum papers to save. Default keeps the daily auto-test inside Apify's 5-minute window. Raise it for production runs (up to 2000 per run; arXiv pages are ~3 sec apart).

## `includeAbstract` (type: `boolean`):

When true, every record includes the full paper abstract. Turn off for title-only feeds.

## `watchlistMode` (type: `boolean`):

When true, the actor remembers arxivIds it has seen across runs and only emits papers it has not delivered before. Pair with a daily schedule to build a 'what's new' feed.

## `proxyConfiguration` (type: `object`):

arXiv's API is public and unauthenticated, so most users can leave this off. Enable residential proxies only for large bulk runs that risk hitting per-IP rate limits.

## Actor input object example

```json
{
  "query": "large language model",
  "categories": [
    "cs.AI",
    "cs.CL"
  ],
  "sortBy": "submittedDate",
  "sortOrder": "descending",
  "maxItems": 10,
  "includeAbstract": true,
  "watchlistMode": false,
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}
```

# Actor output Schema

## `datasetItems` (type: `string`):

Normalized arXiv paper records with author list, abstract, PDF link, and agentMarkdown card.

## `runSummary` (type: `string`):

Compact OUTPUT object with item count, error counts, and pagination stats.

## `agentBriefing` (type: `string`):

Markdown digest of the top 5 papers + summary stats. Drop into Slack or an LLM as a single document.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "query": "large language model",
    "categories": [
        "cs.AI",
        "cs.CL"
    ],
    "proxyConfiguration": {
        "useApifyProxy": false
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("skootle/arxiv-papers").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "query": "large language model",
    "categories": [
        "cs.AI",
        "cs.CL",
    ],
    "proxyConfiguration": { "useApifyProxy": False },
}

# Run the Actor and wait for it to finish
run = client.actor("skootle/arxiv-papers").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "query": "large language model",
  "categories": [
    "cs.AI",
    "cs.CL"
  ],
  "proxyConfiguration": {
    "useApifyProxy": false
  }
}' |
apify call skootle/arxiv-papers --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=skootle/arxiv-papers",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "📄 arXiv Papers Monitor",
        "description": "Pull new AI / ML / CS / physics / math papers from arXiv as they land via the official arXiv API. Title, abstract, authors, PDF link, DOI, and LLM-ready summary card per paper. For ML researchers, AI agents, and journalists. Export, run via API, schedule, or integrate with other tools.",
        "version": "0.1",
        "x-build-id": "uBKoZhhIYIc78tPTD"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/skootle~arxiv-papers/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-skootle-arxiv-papers",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/skootle~arxiv-papers/runs": {
            "post": {
                "operationId": "runs-sync-skootle-arxiv-papers",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/skootle~arxiv-papers/run-sync": {
            "post": {
                "operationId": "run-sync-skootle-arxiv-papers",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "query": {
                        "title": "Free-text search (title + abstract)",
                        "type": "string",
                        "description": "Optional. Free-text term to search across paper titles and abstracts, e.g. 'transformer attention' or 'diffusion model'. Leave blank to pull every paper in the chosen category."
                    },
                    "categories": {
                        "title": "arXiv categories",
                        "type": "array",
                        "description": "Optional list of arXiv categories. Examples: cs.AI (artificial intelligence), cs.LG (machine learning), cs.CL (NLP), cs.CV (computer vision), stat.ML, math.OC, physics.med-ph, q-bio.NC. Multiple categories are OR-joined.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "submittedAfter": {
                        "title": "Submitted after (ISO date)",
                        "type": "string",
                        "description": "Optional ISO date (YYYY-MM-DD). Only return papers submitted on or after this date."
                    },
                    "submittedBefore": {
                        "title": "Submitted before (ISO date)",
                        "type": "string",
                        "description": "Optional ISO date (YYYY-MM-DD). Only return papers submitted on or before this date."
                    },
                    "sortBy": {
                        "title": "Sort field",
                        "enum": [
                            "submittedDate",
                            "lastUpdatedDate",
                            "relevance"
                        ],
                        "type": "string",
                        "description": "How arXiv orders the results.",
                        "default": "submittedDate"
                    },
                    "sortOrder": {
                        "title": "Sort order",
                        "enum": [
                            "descending",
                            "ascending"
                        ],
                        "type": "string",
                        "description": "Direction of the sort.",
                        "default": "descending"
                    },
                    "maxItems": {
                        "title": "Max papers",
                        "minimum": 1,
                        "maximum": 2000,
                        "type": "integer",
                        "description": "Maximum papers to save. Default keeps the daily auto-test inside Apify's 5-minute window. Raise it for production runs (up to 2000 per run; arXiv pages are ~3 sec apart).",
                        "default": 10
                    },
                    "includeAbstract": {
                        "title": "Include full abstract",
                        "type": "boolean",
                        "description": "When true, every record includes the full paper abstract. Turn off for title-only feeds.",
                        "default": true
                    },
                    "watchlistMode": {
                        "title": "Watchlist (only new papers)",
                        "type": "boolean",
                        "description": "When true, the actor remembers arxivIds it has seen across runs and only emits papers it has not delivered before. Pair with a daily schedule to build a 'what's new' feed.",
                        "default": false
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration (optional)",
                        "type": "object",
                        "description": "arXiv's API is public and unauthenticated, so most users can leave this off. Enable residential proxies only for large bulk runs that risk hitting per-IP rate limits."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```