# Thesis Literature Review Scraper (`leafy-dev-jr/thesis-literature-review-scraper`) Actor

Turn any research question into a clean reading list of peer-reviewed academic papers from OpenAlex, Semantic Scholar, and Crossref in one run. Includes citation-manager and spreadsheet exports, plus LLM-ready Markdown you can paste into ChatGPT or Claude for instant literature synthesis.

- **URL**: https://apify.com/leafy-dev-jr/thesis-literature-review-scraper.md
- **Developed by:** [Leafy Dev Jr](https://apify.com/leafy-dev-jr) (community)
- **Categories:** Automation, AI, Other
- **Stats:** 1 total users, 0 monthly users, 0.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $1.00 / 1,000 results

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Thesis Literature Review Scraper — Multi-Source Academic Papers with Citations & LLM-Ready Output

**Paste a research question → get a de-duplicated, structured, LLM-ready list of peer-reviewed papers from OpenAlex, Semantic Scholar, and Crossref.** Perfect for thesis literature reviews, RAG pipelines, and AI research assistants.

---

### What it does

Given a research question or topic keywords, this Actor queries three major free scholarly databases in parallel, de-duplicates the results by DOI (with fuzzy-title fallback), merges enriched fields across sources, and returns a single clean dataset plus LLM-ready exports.

#### Why this exists

Writing a thesis lit review manually takes days of copy-paste across Google Scholar, Scopus, Web of Science, and PubMed. Building a RAG chatbot over academic papers requires the same legwork. This Actor does it in one run.

---

### Data sources

| Source | Size | Auth | License |
| --- | --- | --- | --- |
| [OpenAlex](https://openalex.org/) | 250M+ works | none | CC0 |
| [Semantic Scholar](https://www.semanticscholar.org/product/api) | ~200M papers | optional API key | [ODC-BY](https://opendatacommons.org/licenses/by/) |
| [Crossref](https://www.crossref.org/) | 160M+ records | none | CC0 metadata |

All three are public, free, no-auth JSON APIs. **We do not scrape Google Scholar** — it's explicitly out of scope to respect their Terms of Service.

---

### Input

| Field | Type | Default | Description |
| --- | --- | --- | --- |
| `query` | string (required) | — | Research question or keywords (3–500 chars). |
| `yearFrom` | integer | 2015 | Earliest publication year. |
| `yearTo` | integer | current year | Latest publication year. |
| `maxResults` | integer | 100 | Total de-duplicated papers to return (10–1000). |
| `sources` | array | `["openalex", "crossref"]` | Which databases to query. Semantic Scholar is off by default — enable it if you need its abstracts / `influentialCitationCount`, ideally with a free API key below. |
| `minCitations` | integer | 0 | Filter out papers with fewer citations. |
| `openAccessOnly` | boolean | false | Return only open-access papers. |
| `sortBy` | enum | `relevance` | `relevance` / `citations` / `date`. |
| `outputFormat` | array | `["bibtex", "markdown"]` | Extra export formats on top of JSON. |
| `contactEmail` | string | _blank_ | Optional — enables OpenAlex / Crossref "polite pool" for faster, prioritized access. |
| `semanticScholarApiKey` | string (secret) | _blank_ | Optional — bypasses Semantic Scholar's shared rate limit. |

#### Example input

```json
{
    "query": "impact of social media on adolescent mental health",
    "yearFrom": 2018,
    "yearTo": 2026,
    "maxResults": 75,
    "sortBy": "citations",
    "outputFormat": ["markdown", "bibtex", "csv"]
}
````

***

### Output

#### 1. Dataset

One record per paper. The console shows two views:

- **Papers** — curated view (title, authors, year, venue, citations, DOI, OA, sources). Best for browsing.
- **All fields** — every field in a flat, spreadsheet-friendly order. Best for exporting to Excel.

Key fields per record:

| Group | Fields |
| --- | --- |
| Identity | `doi`, `openAlexId`, `semanticScholarId`, `pmid`, `arxivId` |
| Metadata | `title`, `abstract`, `authors` (array), `authorsDisplay` (joined string), `firstAuthor`, `authorCount`, `year`, `venue`, `publisher` |
| Metrics | `citationCount`, `influentialCitationCount`, `referenceCount`, `fieldsOfStudy` |
| Access | `isOpenAccess`, `openAccessUrl`, `landingPageUrl` |
| Provenance | `sources` (which APIs returned it), `primarySource` |
| LLM payload | `llmSummary` — the formatted Markdown block used to build `literature-review.md` |

Both the structured `authors` array and the flat `authorsDisplay` / `firstAuthor` / `authorCount` columns are included — so JSON / RAG consumers get the full shape, and spreadsheet users get a single clean authors column.

#### 2. Key-Value Store files

Depending on `outputFormat`:

| File | Use it for |
| --- | --- |
| `literature-review.md` | **LLM synthesis** — paste into ChatGPT / Claude / Gemini (see below). |
| `references.bib` | BibTeX — import into Overleaf / LaTeX / Zotero BibTeX library. |
| `references.ris` | RIS — import into Zotero, Mendeley, EndNote, or Citavi (File → Import). |
| `papers.csv` | Excel / Google Sheets / Numbers. Authors already joined with `; ` so no JSON blob. |
| `METADATA` | Per-source fetch status, total citations, de-dupe counts, run timestamp. |

***

### Example output

Abridged excerpts from a run with `query: "artificial intelligence in higher education"`.

**Dataset record (JSON)** — one complete paper record as returned in the dataset:

```json
{
    "title": "Exploring Opportunities and Challenges of Artificial Intelligence and Machine Learning in Higher Education Institutions",
    "authors": [
        {
            "name": "Valentin Kuleto",
            "orcid": "https://orcid.org/0000-0002-7811-5436",
            "affiliation": "University Business Academy in Novi Sad"
        },
        {
            "name": "Milena P. Ilić",
            "orcid": "https://orcid.org/0000-0002-2656-1449",
            "affiliation": "Information Technology School, Belgrade"
        },
        {
            "name": "Mihail Dumangiu",
            "orcid": null,
            "affiliation": null
        },
        {
            "name": "Marko Ranković",
            "orcid": null,
            "affiliation": null
        }
    ],
    "authorsDisplay": "Valentin Kuleto; Milena P. Ilić; Mihail Dumangiu; Marko Ranković",
    "firstAuthor": "Valentin Kuleto",
    "authorCount": 4,
    "year": 2021,
    "publicationDate": "2021-09-17",
    "venue": "Sustainability",
    "venueType": "journal",
    "publisher": "MDPI AG",
    "abstract": "Artificial Intelligence (AI) and Machine Learning (ML) are reshaping how higher education institutions (HEIs) operate, teach, and serve students. This paper explores the opportunities and challenges of integrating AI and ML into HEIs, drawing on a mixed-methods study of 108 faculty and administrators across six institutions. We identify five opportunity areas (personalized learning, administrative automation, predictive analytics, research augmentation, and accessibility) and four challenge areas (data governance, faculty readiness, equity, and cost). We conclude with a roadmap for responsible adoption.",
    "citationCount": 412,
    "referenceCount": 58,
    "influentialCitationCount": 38,
    "fieldsOfStudy": ["Computer Science", "Education", "Sustainability"],
    "isOpenAccess": true,
    "openAccessUrl": "https://www.mdpi.com/2071-1050/13/18/10424/pdf",
    "landingPageUrl": "https://doi.org/10.3390/su131810424",
    "doi": "10.3390/su131810424",
    "openAlexId": "W3199263016",
    "semanticScholarId": null,
    "pmid": null,
    "arxivId": null,
    "sources": ["openalex", "crossref"],
    "primarySource": "openalex",
    "llmSummary": "### Exploring Opportunities and Challenges of Artificial Intelligence and Machine Learning in Higher Education Institutions\n\n**Authors:** Valentin Kuleto, Milena P. Ilić, Mihail Dumangiu, et al. (2021)\n**Venue:** Sustainability (journal)\n**Citations:** 412 (influential: 38) | **Open Access:** yes\n**DOI:** 10.3390/su131810424\n**Fields:** Computer Science, Education, Sustainability\n**Sources:** openalex, crossref\n\n**Abstract:** Artificial Intelligence (AI) and Machine Learning (ML) are reshaping how higher education institutions (HEIs) operate…",
    "relevanceScore": 18.4
}
```

**`literature-review.md` (LLM-ready Markdown)** — first three papers of a run:

```markdown
## Literature Review: artificial intelligence in higher education
*Generated 2026-04-19T22:00:00.000Z | 75 papers from 2 sources*

### Summary Stats
- Date range: 2020–2026
- Total citations across corpus: 18,423
- Open access: 47/75
- Top venues: Sustainability (6); Computers & Education (4); IEEE Access (3); International Journal of Educational Technology in Higher Education (3)

### Source Status
- **openalex**: ok (600 fetched)
- **semanticscholar**: failed (0 fetched, error: not requested)
- **crossref**: ok (200 fetched)

### Papers

### Exploring Opportunities and Challenges of Artificial Intelligence and Machine Learning in Higher Education Institutions

**Authors:** Valentin Kuleto, Milena P. Ilić, Mihail Dumangiu, et al. (2021)
**Venue:** Sustainability (journal)
**Citations:** 412 (influential: 38) | **Open Access:** yes
**DOI:** 10.3390/su131810424
**Fields:** Computer Science, Education, Sustainability
**Sources:** openalex, crossref

**Abstract:** Artificial Intelligence (AI) and Machine Learning (ML) are reshaping how higher education institutions (HEIs) operate, teach, and serve students. This paper explores the opportunities and challenges of integrating AI and ML into HEIs, drawing on a mixed-methods study of 108 faculty and administrators across six institutions. We identify five opportunity areas (personalized learning, administrative automation, predictive analytics, research augmentation, and accessibility) and four challenge areas (data governance, faculty readiness, equity, and cost). We conclude with a roadmap for responsible adoption.

---

### Artificial intelligence in higher education: the state of the field

**Authors:** Helen Crompton, Diane Burke (2023)
**Venue:** International Journal of Educational Technology in Higher Education (journal)
**Citations:** 287 (influential: 29) | **Open Access:** yes
**DOI:** 10.1186/s41239-023-00392-8
**Fields:** Education, Computer Science
**Sources:** openalex, crossref

**Abstract:** This systematic review examines the state of artificial intelligence (AI) research in higher education. Drawing from 138 empirical studies published between 2016 and 2022, we map the field across five dimensions: AI application types, pedagogical goals, student populations, methodological approaches, and reported outcomes. Findings indicate a heavy concentration on adaptive learning systems and intelligent tutoring, with under-representation of equity, ethics, and faculty-perspective research. We propose a research agenda for the next phase of AI-in-higher-ed scholarship.

---

### ChatGPT for good? On opportunities and challenges of large language models for education

**Authors:** Enkelejda Kasneci, Kathrin Seßler, Stefan Küchemann, et al. (2023)
**Venue:** Learning and Individual Differences (journal)
**Citations:** 1,942 (influential: 184) | **Open Access:** yes
**DOI:** 10.1016/j.lindif.2023.102274
**Fields:** Education, Computer Science, Linguistics
**Sources:** openalex, crossref

**Abstract:** Large language models (LLMs) such as ChatGPT are transforming how students access information and produce academic work. This position paper surveys the opportunities LLMs create for educators (personalized feedback, lesson planning, accessibility support) alongside the challenges they raise (academic integrity, factual reliability, equity of access, and assessment redesign). We outline an actionable framework for institutions considering LLM integration, covering policy, pedagogy, and tooling.

---

### [next paper] …
```

***

### LLM-ready Markdown — paste it into your favorite AI model

When `outputFormat` includes `markdown`, the Actor writes a single concatenated `literature-review.md` file to the Key-Value Store. Each paper becomes its own section with title, authors, year, venue, citations, DOI, and abstract — clean prose, no HTML, no source-specific boilerplate.

**How to use it:**

1. Run the Actor.
2. Open the run → **Storage → Key-Value Store**.
3. Download `literature-review.md`.
4. Paste or attach it into ChatGPT / Claude / Gemini and ask for a synthesis.

A few prompts that work well:

- *"Write a 1,500-word literature review organized by theme. Cite every claim inline using (AuthorLastName, Year). Include a Research gaps section. Use only the papers provided."*
- *"For each paper, give me a row: research question, method, sample, key finding, limitation. Output as a Markdown table."*
- *"Group these papers into 4–6 clusters by approach. Name each cluster, list its papers by DOI, and write a 3-sentence synthesis per cluster."*

Pair this with the BibTeX / RIS export and any DOI the LLM cites is already in your Zotero or Overleaf bibliography.

***

### Use cases

1. **Thesis / dissertation lit review** — seed your chapter with 100+ relevant papers in one run.
2. **RAG pipelines over academic content** — ingest the Markdown or CSV into a vector store.
3. **AI research assistants** — pre-index a corpus for a domain-specific chatbot.
4. **Systematic reviews** — starting-point screening before manual PRISMA filtering.
5. **Citation-graph sanity checks** — verify a paper is findable in multiple databases.

***

### Limitations (V1)

- **Metadata only.** No full-text PDF download. Use `openAccessUrl` if you want to fetch PDFs yourself.
- **Max 1000 papers per run** (de-duplicated). Run multiple queries for broader coverage.
- **Semantic Scholar anonymous pool can 429.** If you need reliable SS results at scale, provide a free API key.
- **Crossref abstract coverage is sparse** (~20%). Primary abstract source is Semantic Scholar, then OpenAlex.
- **No Google Scholar / Scopus / Web of Science.** These require commercial licenses or violate ToS to scrape.
- **English-biased.** The sources cover multiple languages but keyword matching works best in English.

***

### Legal & licensing

- Output is metadata only — no full-text reproduction. Respects copyright and each source's terms.
- You are responsible for citing sources appropriately in your own work.
- OpenAlex data is CC0. Crossref metadata is CC0. Semantic Scholar data is [ODC-BY](https://opendatacommons.org/licenses/by/).
- This Actor does not scrape Google Scholar — explicitly avoided due to their ToS and anti-bot measures.
- No personal data is collected. `contactEmail`, if provided, is sent only to OpenAlex / Crossref as a polite-pool identifier per their API conventions; it is not stored by this Actor.

***

### Roadmap

V2 (after V1 traction):

- MCP Standby mode (expose as a tool for Claude / Cursor agents)
- Open-access PDF fetching via Unpaywall
- PubMed / arXiv / CORE as additional sources
- Token-aware RAG-ready chunking
- Citation-graph traversal
- Scheduled re-runs for literature monitoring

***

### Contact / feedback

Bug reports, feature requests, and use-case tips welcome — send a message at **leafydevjr@gmail.com** or leave a review on the Apify Store listing.

# Actor input Schema

## `query` (type: `string`):

Your research question or topic keywords (e.g., 'impact of social media on adolescent mental health').

## `yearFrom` (type: `integer`):

Only include papers published in or after this year.

## `yearTo` (type: `integer`):

Only include papers published in or before this year.

## `maxResults` (type: `integer`):

Total de-duplicated papers across all sources (10–1000). Larger runs take longer and cost more — start with 100 to validate the query.

## `sources` (type: `array`):

Which free scholarly APIs to aggregate. Results are deduplicated by DOI (with fuzzy-title fallback) and field-merged across sources. ⚠️ Adding Semantic Scholar can make runs noticeably longer — its anonymous rate-limit pool is strict (1 req/sec + frequent 429s). Enable it only if you need its abstracts/influentialCitationCount, or supply a free Semantic Scholar API key below to bypass the shared pool.

## `minCitations` (type: `integer`):

Drop papers with fewer citations than this. Set to 0 to keep all.

## `openAccessOnly` (type: `boolean`):

When true, only include papers with a free open-access PDF link.

## `sortBy` (type: `string`):

Ranking order for the final deduped list.

## `outputFormat` (type: `array`):

JSON dataset is always produced. These are additional exports written to the run's Key-Value Store.

## `contactEmail` (type: `string`):

Optional. When provided, we include it in OpenAlex and Crossref requests so you get priority routing and higher rate limits. Leave blank to run anonymously. Never stored, never shared.

## `semanticScholarApiKey` (type: `string`):

Optional. Semantic Scholar's anonymous pool is heavily rate-limited. If you have a free API key from https://www.semanticscholar.org/product/api#api-key-form, paste it here to bypass the shared pool. Leave blank to run without it.

## Actor input object example

```json
{
  "yearFrom": 2015,
  "yearTo": 2026,
  "maxResults": 100,
  "sources": [
    "openalex",
    "crossref"
  ],
  "minCitations": 0,
  "openAccessOnly": false,
  "sortBy": "relevance",
  "outputFormat": [
    "bibtex",
    "markdown"
  ]
}
```

# Actor output Schema

## `papers` (type: `string`):

One row per unique peer-reviewed paper, merged across OpenAlex, Semantic Scholar, and Crossref. Two views are provided: the curated 'Papers' view (default) and 'All fields' for full detail.

## `literatureReview` (type: `string`):

Concatenated Markdown document with one section per paper — paste into ChatGPT / Claude / Gemini for instant literature synthesis. Only produced when outputFormat includes 'markdown'.

## `bibtex` (type: `string`):

BibTeX bibliography file for LaTeX / Overleaf / Zotero. Only produced when outputFormat includes 'bibtex'.

## `ris` (type: `string`):

RIS file for Zotero / Mendeley / EndNote / Citavi (File → Import). Only produced when outputFormat includes 'ris'.

## `csv` (type: `string`):

Flat CSV export for Excel / Google Sheets, with authors already joined into a single column. Only produced when outputFormat includes 'csv'.

## `metadata` (type: `string`):

Per-source fetch status, de-duplication counts, total citations, top venues, and run timestamp.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {};

// Run the Actor and wait for it to finish
const run = await client.actor("leafy-dev-jr/thesis-literature-review-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {}

# Run the Actor and wait for it to finish
run = client.actor("leafy-dev-jr/thesis-literature-review-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{}' |
apify call leafy-dev-jr/thesis-literature-review-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=leafy-dev-jr/thesis-literature-review-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Thesis Literature Review Scraper",
        "description": "Turn any research question into a clean reading list of peer-reviewed academic papers from OpenAlex, Semantic Scholar, and Crossref in one run. Includes citation-manager and spreadsheet exports, plus LLM-ready Markdown you can paste into ChatGPT or Claude for instant literature synthesis.",
        "version": "0.1",
        "x-build-id": "BhYpur6Yhdz2Wh0DD"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/leafy-dev-jr~thesis-literature-review-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-leafy-dev-jr-thesis-literature-review-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/leafy-dev-jr~thesis-literature-review-scraper/runs": {
            "post": {
                "operationId": "runs-sync-leafy-dev-jr-thesis-literature-review-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/leafy-dev-jr~thesis-literature-review-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-leafy-dev-jr-thesis-literature-review-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "required": [
                    "query"
                ],
                "properties": {
                    "query": {
                        "title": "Research question or keywords",
                        "minLength": 3,
                        "maxLength": 500,
                        "type": "string",
                        "description": "Your research question or topic keywords (e.g., 'impact of social media on adolescent mental health')."
                    },
                    "yearFrom": {
                        "title": "Earliest publication year",
                        "minimum": 1900,
                        "maximum": 2100,
                        "type": "integer",
                        "description": "Only include papers published in or after this year.",
                        "default": 2015
                    },
                    "yearTo": {
                        "title": "Latest publication year",
                        "minimum": 1900,
                        "maximum": 2100,
                        "type": "integer",
                        "description": "Only include papers published in or before this year.",
                        "default": 2026
                    },
                    "maxResults": {
                        "title": "Max papers to return",
                        "minimum": 10,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Total de-duplicated papers across all sources (10–1000). Larger runs take longer and cost more — start with 100 to validate the query.",
                        "default": 100
                    },
                    "sources": {
                        "title": "Sources to query",
                        "uniqueItems": true,
                        "type": "array",
                        "description": "Which free scholarly APIs to aggregate. Results are deduplicated by DOI (with fuzzy-title fallback) and field-merged across sources. ⚠️ Adding Semantic Scholar can make runs noticeably longer — its anonymous rate-limit pool is strict (1 req/sec + frequent 429s). Enable it only if you need its abstracts/influentialCitationCount, or supply a free Semantic Scholar API key below to bypass the shared pool.",
                        "items": {
                            "type": "string",
                            "enum": [
                                "openalex",
                                "semanticscholar",
                                "crossref"
                            ],
                            "enumTitles": [
                                "OpenAlex (250M+ works)",
                                "Semantic Scholar (~200M papers — slower, rate-limited)",
                                "Crossref (160M+ records)"
                            ]
                        },
                        "default": [
                            "openalex",
                            "crossref"
                        ]
                    },
                    "minCitations": {
                        "title": "Minimum citation count (0 = no filter)",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Drop papers with fewer citations than this. Set to 0 to keep all.",
                        "default": 0
                    },
                    "openAccessOnly": {
                        "title": "Only papers with open access",
                        "type": "boolean",
                        "description": "When true, only include papers with a free open-access PDF link.",
                        "default": false
                    },
                    "sortBy": {
                        "title": "Sort results by",
                        "enum": [
                            "relevance",
                            "citations",
                            "date"
                        ],
                        "type": "string",
                        "description": "Ranking order for the final deduped list.",
                        "default": "relevance"
                    },
                    "outputFormat": {
                        "title": "Additional output formats",
                        "type": "array",
                        "description": "JSON dataset is always produced. These are additional exports written to the run's Key-Value Store.",
                        "items": {
                            "type": "string",
                            "enum": [
                                "bibtex",
                                "ris",
                                "markdown",
                                "csv"
                            ]
                        },
                        "default": [
                            "bibtex",
                            "markdown"
                        ]
                    },
                    "contactEmail": {
                        "title": "Your email (optional — enables polite-pool API access)",
                        "type": "string",
                        "description": "Optional. When provided, we include it in OpenAlex and Crossref requests so you get priority routing and higher rate limits. Leave blank to run anonymously. Never stored, never shared."
                    },
                    "semanticScholarApiKey": {
                        "title": "Semantic Scholar API key (optional)",
                        "type": "string",
                        "description": "Optional. Semantic Scholar's anonymous pool is heavily rate-limited. If you have a free API key from https://www.semanticscholar.org/product/api#api-key-form, paste it here to bypass the shared pool. Leave blank to run without it."
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
