# bioRxiv + medRxiv Scraper for RAG (`devanshlive/biorxiv-medrxiv-rag-extractor`) Actor

Scrape bioRxiv and medRxiv preprints by server, category, and date range. Returns RAG-ready JSON with JATS full-text chunks (cl100k\_base, 512/50) when available and abstract fallback otherwise. Drop-in for LangChain, LlamaIndex, Qdrant, Pinecone, Weaviate, pgvector. $0.02 per preprint.

- **URL**: https://apify.com/devanshlive/biorxiv-medrxiv-rag-extractor.md
- **Developed by:** [Devansh Tiwari](https://apify.com/devanshlive) (community)
- **Categories:** AI, Developer tools, Integrations
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $20.00 / 1,000 papers

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## bioRxiv + medRxiv Scraper for RAG: Preprints as Chunked JSON

**Scrape bioRxiv and medRxiv preprints into RAG-ready JSON in one call.** Pulls preprints by server, category, and posting-date range. Fetches JATS full-text when available and falls back to the abstract otherwise. Returns fixed-token chunks (cl100k_base, 512 tokens / 50 overlap) with full metadata, ready to drop into **LangChain, LlamaIndex, Qdrant, Pinecone, Weaviate, pgvector, or Chroma**. Built for biomedical AI teams, pharma/biotech researchers, drug-discovery AI, and clinical evidence tooling.

### What does bioRxiv + medRxiv Scraper for RAG do?

This Apify Actor scrapes **[bioRxiv](https://www.biorxiv.org/)** and **[medRxiv](https://www.medrxiv.org/)** preprints matching your server, category, and date window, fetches the JATS full-text XML where available, and splits the resulting plain text into **tokenizer-aware chunks (512 tokens, 50-token overlap, tiktoken cl100k_base)** ready to embed or feed into a RAG index.

Each output record contains clean metadata (DOI, server, version, title, authors, category, posting date, license) and a `chunks` array of `{ idx, text, tokens }` ready for direct ingestion into a vector database.

**Try it in the Apify Console.** Pick one or both servers, an optional category slug, a date range, a preprint cap, and hit Start. Download results as JSON, CSV, or Excel.

Built on the Apify platform, you also get: scheduled runs, HTTP API access, integrations with Zapier / n8n / Make, proxy rotation, monitoring, and alerts. No infrastructure to run yourself.

### Why use bioRxiv + medRxiv Scraper for RAG?

- **Preprints, not published-only.** The fastest-moving biomedical evidence is on bioRxiv and medRxiv months before it reaches PubMed.
- **Skip the JATS XML grind.** Clean preprint records, not raw `<article>` trees with boilerplate to strip.
- **Full-text when it exists.** Every record carries a `jatsxml` URL. When the XML parses and produces useful prose, `source: "full_text"` lands. Abstract fallback otherwise. See the Limits section for the realistic coverage rate.
- **Both servers, one run.** Pass `servers: ["biorxiv", "medrxiv"]` and filter downstream by the `server` field on each record.
- **Category filtering per server.** bioRxiv and medRxiv use different taxonomies, both supported.
- **Pre-chunked for RAG.** `tiktoken cl100k_base` tokenization, compatible with OpenAI `text-embedding-3`, Claude, Cohere, and most BGE/E5/nomic embedding models.
- **Vector-DB neutral.** Drop into **Qdrant, Pinecone, Weaviate, pgvector (Supabase / Neon), Chroma, Milvus** without reformatting.
- **Framework-ready.** Works with **LangChain, LlamaIndex, Haystack, LangGraph**.
- **Respectful rate limiting.** 3 requests per second total across both servers. No API key needed.
- **Cheap.** $0.02 per preprint. A month of medRxiv oncology (~300 preprints) costs around $6.

### How to use bioRxiv + medRxiv Scraper for RAG

1. **Open the Actor** in Apify Console.
2. **Pick `servers`** (one or both of `biorxiv`, `medrxiv`).
3. **(Optional) Set `category`** to a server-specific slug. Leave empty for all categories.
4. **Set `dateFrom` / `dateTo`** in `YYYY-MM-DD` format.
5. **Set `maxPreprints`** to cap the run. The cap is global across both servers.
6. **Click Start.** Expect roughly 100 to 200 preprints per minute under the 3 req/s ceiling.
7. **Download results** from the Storage tab.

### Input

| Field | Type | Required | Default | Description |
|---|---|---|---|---|
| `servers` | string[] | | `["biorxiv", "medrxiv"]` | One or both of `biorxiv`, `medrxiv`. Iteration order is preserved in the output. |
| `category` | string | | `""` | Server-specific category slug. Empty = all categories. See the per-server list below. |
| `dateFrom` | string (YYYY-MM-DD) | yes | `2024-01-01` | Inclusive posting-date lower bound |
| `dateTo` | string (YYYY-MM-DD) | yes | `2024-01-07` | Inclusive posting-date upper bound |
| `maxPreprints` | integer | | `100` | Global cap across both servers (1 to 100000) |

**Example input (combined run):**

```json
{
    "servers": ["biorxiv", "medrxiv"],
    "category": "",
    "dateFrom": "2024-01-01",
    "dateTo": "2024-01-07",
    "maxPreprints": 100
}
````

**Example input (medRxiv oncology only):**

```json
{
    "servers": ["medrxiv"],
    "category": "oncology",
    "dateFrom": "2024-01-01",
    "dateTo": "2024-01-31",
    "maxPreprints": 500
}
```

#### Category slugs

**bioRxiv:** `animal_behavior_and_cognition`, `biochemistry`, `bioengineering`, `bioinformatics`, `biophysics`, `cancer_biology`, `cell_biology`, `developmental_biology`, `ecology`, `evolutionary_biology`, `genetics`, `genomics`, `immunology`, `microbiology`, `molecular_biology`, `neuroscience`, `paleontology`, `pathology`, `pharmacology_and_toxicology`, `physiology`, `plant_biology`, `scientific_communication_and_education`, `synthetic_biology`, `systems_biology`, `zoology`.

**medRxiv:** `addiction_medicine`, `allergy_and_immunology`, `anesthesia`, `cardiovascular_medicine`, `dentistry_and_oral_medicine`, `dermatology`, `emergency_medicine`, `endocrinology`, `epidemiology`, `gastroenterology`, `genetic_and_genomic_medicine`, `geriatric_medicine`, `health_economics`, `health_informatics`, `health_policy`, `health_systems_and_quality_improvement`, `hematology`, `hiv_aids`, `infectious_diseases`, `intensive_care_and_critical_care_medicine`, `medical_education`, `medical_ethics`, `nephrology`, `neurology`, `nursing`, `nutrition`, `obstetrics_and_gynecology`, `occupational_and_environmental_health`, `oncology`, `ophthalmology`, `orthopedics`, `otolaryngology`, `pain_medicine`, `palliative_medicine`, `pathology`, `pediatrics`, `pharmacology_and_therapeutics`, `primary_care_research`, `psychiatry_and_clinical_psychology`, `public_and_global_health`, `radiology_and_imaging`, `rehabilitation_medicine_and_physical_therapy`, `respiratory_medicine`, `rheumatology`, `sexual_and_reproductive_health`, `sports_medicine`, `surgery`, `toxicology`, `transplantation`, `urology`.

**Category slug mismatch warning.** Setting `category: "neuroscience"` with `servers: ["medrxiv"]` returns zero medRxiv records because medRxiv has no `neuroscience` slug. The Actor logs a warning in this case but does not fail. Split the run into two calls or leave `category` empty if you want everything.

### Output

Each preprint becomes one dataset item. You can download the dataset in JSON, HTML, CSV, or Excel.

```json
{
    "doi": "10.1101/2024.03.15.585219",
    "server": "biorxiv",
    "version": "1",
    "title": "A concise title here",
    "abstract": "The abstract as returned by the bioRxiv API.",
    "authors": ["Smith, J.", "Doe, J."],
    "category": "neuroscience",
    "publication_date": "2024-03-15",
    "preprint_url": "https://www.biorxiv.org/content/10.1101/2024.03.15.585219v1",
    "license": "cc_by",
    "source": "full_text",
    "chunks": [
        { "idx": 0, "text": "...", "tokens": 487 },
        { "idx": 1, "text": "...", "tokens": 512 }
    ]
}
```

`source` is `"full_text"` when JATS XML parsed into useful prose, `"abstract"` when it fell back.

#### Data table

| Field | Type | Description |
|---|---|---|
| `doi` | string | DOI (primary identifier, e.g. `10.1101/2024.03.15.585219`) |
| `server` | `"biorxiv"` | `"medrxiv"` | Which server the preprint came from |
| `version` | string | Preprint version returned by the API (latest at fetch time) |
| `title` | string | Preprint title |
| `abstract` | string | Abstract as returned by the bioRxiv API |
| `authors` | string\[] | Author display names in the order the API returned them |
| `category` | string | Server-specific category slug |
| `publication_date` | ISO date | `YYYY-MM-DD` posting date |
| `preprint_url` | string | Canonical preprint landing page |
| `license` | string? | Normalized license key: `cc_by`, `cc_by_nc`, `cc_by_nd`, `cc_by_nc_nd`, `cc0`, `none`, or null |
| `source` | `"full_text"` | `"abstract"` | Text origin |
| `chunks` | Chunk\[] | Token-aware chunks for RAG |
| `chunks[].idx` | number | 0-indexed position |
| `chunks[].text` | string | Chunk text |
| `chunks[].tokens` | number | Token count (≤ 512) |

### Pricing

**$0.02 per preprint** (PPR, pay per result).

#### How much does it cost to scrape bioRxiv and medRxiv?

| Volume | Estimated cost |
|---|---|
| 100 preprints | **~$2** |
| 1,000 preprints | **~$20** |
| 10,000 preprints | **~$200** |
| 100,000 preprints | **~$2,000** |

No subscription. No minimum. You pay only for successful records.

### Limits you should know before you run

- **Full-text coverage is roughly 40 to 80 percent of returned records.** bioRxiv and medRxiv publish JATS XML for most recent preprints, but availability varies by category, server, and how recently the preprint was posted. The remaining records fall back to abstract-only (`source: "abstract"`). Budget your ingest pipeline with this in mind.
- **bioRxiv is behind Cloudflare.** The Actor uses got-scraping's browser-like TLS fingerprint to pass the JS challenge. Occasional transient 403s are retried automatically.
- **Only the latest version** of each preprint is returned. Version history is a v2 feature.
- **No figure or table extraction.** Captions stay inline as text inside body chunks. Figure and table content is dropped during the JATS strip pass.
- **No citation graph.** Reference lists are stripped from body text to keep chunks dense. Reference extraction is a v2 feature.
- **No section-aware chunking.** Chunks are fixed-token (512 with 50 overlap). Section-level splitting (Abstract / Introduction / Methods / Results / Discussion) is deferred.

### Tips

- **Split large backfills** into month-sized windows and run them in parallel Apify runs. The 3 req/s limiter is per-run, so parallel runs scale linearly.
- **Pair with PubMed RAG Extractor** for the fast + validated flow: preprints today, peer-reviewed tomorrow. See the [sister Actor](https://apify.com/devanshlive/pubmed-rag-extractor) in this portfolio.
- **Track the same DOIs over time** by running weekly and diffing on `doi`. New versions appear as new records with higher `version` numbers.
- **`source: "abstract"` records are still useful.** Abstracts are dense, well-structured, and often the most information-rich section of a preprint.

### Publish your output as a HuggingFace dataset

If you extract a category-scoped corpus (e.g. every bioRxiv immunology preprint from 2024), consider publishing the output as a HuggingFace dataset:

```bash
pip install datasets
## Then, in Python:
## from datasets import Dataset
## ds = Dataset.from_json("output.json")
## ds.push_to_hub("your-username/biorxiv-immunology-2024")
```

Community HuggingFace datasets attract organic discoverability for your work and for this Actor. Link back to the Actor in the dataset card so downstream users can regenerate fresh extractions.

### Academic citation

If you use the output of this Actor in a paper, cite bioRxiv and medRxiv directly, and optionally the Actor:

```
bioRxiv: https://www.biorxiv.org/  (Cold Spring Harbor Laboratory)
medRxiv: https://www.medrxiv.org/  (Cold Spring Harbor Laboratory, Yale University, BMJ)
Data extracted via the bioRxiv + medRxiv Scraper for RAG Apify Actor.
```

### FAQ and limitations

#### Is scraping bioRxiv and medRxiv legal?

Both servers publish a public REST API at `api.biorxiv.org` that is explicitly designed for programmatic access. Preprints on bioRxiv and medRxiv are posted under Creative Commons or equivalent open licenses, visible on each preprint's landing page and in the `license` field of every record. This Actor honors a conservative 3 req/s ceiling across both servers and fetches only publicly-available content.

#### Why is `source: "abstract"` more common than I expected?

Several reasons: (1) older preprints do not always have JATS XML, (2) some preprint categories have lower JATS coverage, (3) transient upstream errors fall back to abstract to keep the pipeline moving. The Actor logs an aggregate full-text rate at the end of every run so you can check the coverage in your chosen window.

#### What if my category slug returns zero results?

Check the slug against the per-server lists above. bioRxiv and medRxiv use completely different taxonomies. A warning is logged when a slug is very likely wrong for the selected server.

#### Support

Found a bug or want a feature? Use the **Issues** tab on the Actor's page. Custom requirements (other preprint servers, version history, section-aware chunking, figure extraction)? Reach out via the Actor's Support link.

#### Disclaimer

Output metadata is from the public bioRxiv / medRxiv REST API. Full text, when available, comes from the JATS XML each server publishes alongside the preprint. Check individual preprints' licenses on their landing pages before downstream commercial use.

***

**Built with Apify + Crawlee + TypeScript.** Part of the [actorstack](https://github.com/Devansh-365/actorstack) portfolio. Sister Actors: [PubMed Scraper for RAG](https://apify.com/devanshlive/pubmed-rag-extractor), [arXiv Scraper for RAG](https://apify.com/devanshlive/arxiv-rag-extractor).

# Actor input Schema

## `servers` (type: `array`):

Which preprint servers to scrape. Pick one or both. Records carry a server field so you can filter downstream.

## `category` (type: `string`):

Server-side category filter. bioRxiv and medRxiv use DIFFERENT taxonomies. Examples: bioRxiv neuroscience, cancer\_biology, microbiology. medRxiv oncology, infectious\_diseases, epidemiology. Leave empty for all categories. See README for the full per-server slug list.

## `dateFrom` (type: `string`):

Inclusive lower bound of the preprint posting date.

## `dateTo` (type: `string`):

Inclusive upper bound of the preprint posting date.

## `maxPreprints` (type: `integer`):

Hard cap on returned preprints across both servers combined. Outbound rate limit is 3 req/s total. Expect roughly 100 to 200 preprints per minute.

## Actor input object example

```json
{
  "servers": [
    "biorxiv",
    "medrxiv"
  ],
  "category": "",
  "dateFrom": "2024-01-01",
  "dateTo": "2024-01-07",
  "maxPreprints": 100
}
```

# Actor output Schema

## `dataset` (type: `string`):

No description

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "servers": [
        "biorxiv",
        "medrxiv"
    ],
    "category": "",
    "dateFrom": "2024-01-01",
    "dateTo": "2024-01-07",
    "maxPreprints": 100
};

// Run the Actor and wait for it to finish
const run = await client.actor("devanshlive/biorxiv-medrxiv-rag-extractor").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "servers": [
        "biorxiv",
        "medrxiv",
    ],
    "category": "",
    "dateFrom": "2024-01-01",
    "dateTo": "2024-01-07",
    "maxPreprints": 100,
}

# Run the Actor and wait for it to finish
run = client.actor("devanshlive/biorxiv-medrxiv-rag-extractor").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "servers": [
    "biorxiv",
    "medrxiv"
  ],
  "category": "",
  "dateFrom": "2024-01-01",
  "dateTo": "2024-01-07",
  "maxPreprints": 100
}' |
apify call devanshlive/biorxiv-medrxiv-rag-extractor --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=devanshlive/biorxiv-medrxiv-rag-extractor",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "bioRxiv + medRxiv Scraper for RAG",
        "description": "Scrape bioRxiv and medRxiv preprints by server, category, and date range. Returns RAG-ready JSON with JATS full-text chunks (cl100k_base, 512/50) when available and abstract fallback otherwise. Drop-in for LangChain, LlamaIndex, Qdrant, Pinecone, Weaviate, pgvector. $0.02 per preprint.",
        "version": "0.1",
        "x-build-id": "gRHo5NSaWvkThPyby"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/devanshlive~biorxiv-medrxiv-rag-extractor/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-devanshlive-biorxiv-medrxiv-rag-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/devanshlive~biorxiv-medrxiv-rag-extractor/runs": {
            "post": {
                "operationId": "runs-sync-devanshlive-biorxiv-medrxiv-rag-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/devanshlive~biorxiv-medrxiv-rag-extractor/run-sync": {
            "post": {
                "operationId": "run-sync-devanshlive-biorxiv-medrxiv-rag-extractor",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "dateFrom",
                    "dateTo"
                ],
                "properties": {
                    "servers": {
                        "title": "Servers",
                        "type": "array",
                        "description": "Which preprint servers to scrape. Pick one or both. Records carry a server field so you can filter downstream.",
                        "items": {
                            "type": "string",
                            "enum": [
                                "biorxiv",
                                "medrxiv"
                            ]
                        },
                        "default": [
                            "biorxiv",
                            "medrxiv"
                        ]
                    },
                    "category": {
                        "title": "Category slug (optional)",
                        "type": "string",
                        "description": "Server-side category filter. bioRxiv and medRxiv use DIFFERENT taxonomies. Examples: bioRxiv neuroscience, cancer_biology, microbiology. medRxiv oncology, infectious_diseases, epidemiology. Leave empty for all categories. See README for the full per-server slug list.",
                        "default": ""
                    },
                    "dateFrom": {
                        "title": "From date (YYYY-MM-DD)",
                        "type": "string",
                        "description": "Inclusive lower bound of the preprint posting date.",
                        "default": "2024-01-01"
                    },
                    "dateTo": {
                        "title": "To date (YYYY-MM-DD)",
                        "type": "string",
                        "description": "Inclusive upper bound of the preprint posting date.",
                        "default": "2024-01-07"
                    },
                    "maxPreprints": {
                        "title": "Max preprints per run",
                        "minimum": 1,
                        "maximum": 100000,
                        "type": "integer",
                        "description": "Hard cap on returned preprints across both servers combined. Outbound rate limit is 3 req/s total. Expect roughly 100 to 200 preprints per minute.",
                        "default": 100
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
