# Semantic Scholar Scraper (`solidcode/semanticscholar-scraper`) Actor

\[💰 $6 / 1K] Extract academic papers, abstracts, citations, references, authors, and open-access PDF links from Semantic Scholar's 200M+ database. Search by keyword, paper ID/DOI/URL, or author. Filter by year, field, and citations. No API key.

- **URL**: https://apify.com/solidcode/semanticscholar-scraper.md
- **Developed by:** [SolidCode](https://apify.com/solidcode) (community)
- **Categories:** Developer tools, Automation, AI
- **Stats:** 2 total users, 1 monthly users, 100.0% runs succeeded, 0 bookmarks
- **User rating**: No ratings yet

## Pricing

from $6.00 / 1,000 results

This Actor is paid per event. You are not charged for the Apify platform usage, but only a fixed price for specific events.
Since this Actor supports Apify Store discounts, the price gets lower the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Semantic Scholar Scraper

Pull academic papers, author profiles, and full citation graphs from Semantic Scholar's 200M+ paper corpus — complete with abstracts, DOIs, arXiv IDs, h-index metrics, citation counts, and direct open-access PDF links. Search by keyword, fetch an exact paper by DOI or arXiv ID, or look up an author profile in one run. Built for researchers, systematic-review teams, and data scientists who need a clean, structured scholarly dataset across every discipline without stitching together the public API one page at a time.

### Why This Scraper?

- **Three ways in, one dataset** — keyword search across titles and abstracts, direct fetch by Semantic Scholar paper ID / DOI / arXiv ID / CorpusId / PMID / URL, and author lookup by ID or profile URL. Mix all three in a single run.
- **200M+ papers across 23 fields of study** — filter to any combination of Computer Science, Medicine, Biology, Physics, Economics, Mathematics, Law, Linguistics, and 15 more — exact filters, not fuzzy "suggestions".
- **12 publication-type filters** — narrow to peer-reviewed `JournalArticle`, `Review`, `MetaAnalysis`, `ClinicalTrial`, `Conference`, `Dataset`, `Book`, and more for systematic-review-grade precision.
- **Citation + reference graph expansion** — opt in to pull every paper that cites a work, or every paper it references, as separate rows — capped per paper so even "Attention Is All You Need" stays bounded.
- **Author profiles with h-index** — name, affiliations, paper count, total citations, h-index, and homepage as first-class records, plus an opt-in full publication list per author.
- **Identifier-rich rows** — every paper carries its DOI, arXiv ID, native paper ID, author IDs, influential-citation count, and canonical URL, so you can join against PubMed, Crossref, or arXiv downstream.
- **Direct open-access PDF links** — `openAccessPdfUrl` and an `isOpenAccess` flag surface free full text on every eligible paper, with an open-access-only filter to keep just the downloadable ones.
- **High-impact filtering** — minimum-citation-count, year-range, and sort-by-citations-or-date controls let you surface the most-cited or most-recent work in a field instantly.
- **No API key, no sign-up** — go from a keyword or a DOI to a structured dataset of up to 10,000 papers per query.

### Use Cases

**Literature Reviews & Systematic Reviews**
- Assemble a complete, deduplicated reading list for a new topic in minutes
- Filter to `Review` and `MetaAnalysis` types for evidence-synthesis projects
- Restrict to open-access PDFs to build a downloadable full-text corpus

**Research Trend Analysis**
- Track publication volume in a field across a year range
- Surface the most-cited papers of the last two years with citation sorting
- Detect emerging sub-fields from a burst of recent open-access work

**Citation Network Mapping**
- Expand a seminal paper's citing-paper graph to find follow-up research
- Pull a paper's reference list to trace its intellectual lineage
- Build directed citation edges between papers for bibliometric graphs

**Competitive Research Intelligence**
- Monitor what a lab or institution is publishing by author ID
- Benchmark researcher output with h-index, paper count, and total citations
- Quantify a topic's influence with influential-citation counts

**Academic Lead Generation**
- Find domain experts to quote, interview, or recruit via author profiles
- Pull affiliations and homepages for outreach to corresponding researchers
- Identify rising authors by citation growth in a specific field

**Dataset Building for Machine Learning**
- Harvest titles + abstracts at scale for NLP and recommendation models
- Build labeled corpora filtered by field of study and publication type
- Collect open-access PDF links for full-text mining pipelines

### Getting Started

#### Basic Keyword Search

The simplest run — one topic, 100 papers:

```json
{
    "searchQueries": ["large language models"],
    "maxResults": 100
}
````

#### Filtered Search (Year + Field + Open Access)

Narrow to recent, high-impact, open-access computer science work and sort by citations:

```json
{
    "searchQueries": ["retrieval augmented generation"],
    "yearFrom": 2023,
    "yearTo": 2025,
    "fieldsOfStudy": ["Computer Science"],
    "publicationTypes": ["JournalArticle", "Conference"],
    "openAccessOnly": true,
    "minCitationCount": 25,
    "sortBy": "citationCount",
    "maxResults": 200
}
```

#### Direct Fetch with Citation + Reference Graph

Fetch exact papers by DOI and arXiv ID, then pull who cites them and what they reference:

```json
{
    "paperIds": [
        "10.1038/nature14539",
        "arXiv:1706.03762",
        "https://www.semanticscholar.org/paper/204e3073870fae3d05bcbc2f6a8e263d9b72e776"
    ],
    "includeCitations": true,
    "includeReferences": true,
    "maxCitationsPerPaper": 50
}
```

#### Author Profile Lookup

Pull author profiles by ID or URL, with their full publication lists:

```json
{
    "authorIds": ["1741101", "https://www.semanticscholar.org/author/2061296"],
    "includeAuthorPapers": true,
    "maxResults": 200
}
```

To find an author ID, open any Semantic Scholar author page and copy the number after `/author/` in the URL.

### Input Reference

#### Search & Input

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `searchQueries` | string\[] | `["large language models"]` | Keywords searched across paper titles and abstracts. Each query produces its own result set. |
| `paperIds` | string\[] | `[]` | Fetch exact papers by Semantic Scholar paper ID, DOI, arXiv ID, CorpusId, PMID, or paper URL. One record per paper. |
| `authorIds` | string\[] | `[]` | Author IDs (numeric) or full profile URLs. Returns an author-profile record with name, affiliations, h-index, and citation count. |
| `maxResults` | integer | `100` | Maximum papers per search query — an exact cap on what you are charged. Set to `0` for all available results (capped at 10,000 per query). |

#### Filters

Filters apply to search queries only, not to directly-fetched papers or authors.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `yearFrom` | integer | null | Only include papers published in this year or later (1900–2100). |
| `yearTo` | integer | null | Only include papers published in this year or earlier (1900–2100). |
| `fieldsOfStudy` | string\[] | `[]` | Restrict to one or more of 23 research fields (Computer Science, Medicine, Biology, Physics, Economics, and more). |
| `publicationTypes` | string\[] | `[]` | Restrict to one or more of 12 types: Review, JournalArticle, CaseReport, ClinicalTrial, Conference, Dataset, Editorial, LettersAndComments, MetaAnalysis, News, Study, Book. |
| `openAccessOnly` | boolean | `false` | Only return papers with a free, downloadable open-access PDF. |
| `minCitationCount` | integer | null | Only return papers cited at least this many times — ideal for surfacing high-impact work. |
| `sortBy` | string | `"relevance"` | `"Relevance"` (default order), `"Most cited first"` (by citation count), or `"Most recent first"` (by publication date). |

#### Output Options

The citation, reference, and author-paper expansions each add one row per child item, which multiplies your result count and cost — leave them off unless you need the full graph.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `includeAbstracts` | boolean | `true` | Include the abstract text for each paper. Disable to shrink the dataset. |
| `includeReferences` | boolean | `false` | For each paper, also output the papers it cites (its reference list) as separate records. |
| `includeCitations` | boolean | `false` | For each paper, also output the papers that cite it as separate records. |
| `maxCitationsPerPaper` | integer | `50` | Caps how many citing/referenced papers are fetched per source paper (1–1000) when expansion is on. |
| `includeAuthorPapers` | boolean | `false` | When you provide author IDs, also output each author's publications as separate paper records. |

### Output

Every row carries a `recordType` field — `paper` or `author` — so you can filter cleanly downstream. The dataset ships with two ready-made views: **Papers** and **Author profiles**.

#### Paper (`recordType: "paper"`)

```json
{
    "recordType": "paper",
    "paperId": "204e3073870fae3d05bcbc2f6a8e263d9b72e776",
    "title": "Attention Is All You Need",
    "abstract": "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks...",
    "authors": [
        {"authorId": "40348417", "name": "Ashish Vaswani"},
        {"authorId": "1846258", "name": "Noam Shazeer"}
    ],
    "year": 2017,
    "publicationDate": "2017-06-12",
    "venue": "Neural Information Processing Systems",
    "publicationTypes": ["JournalArticle", "Conference"],
    "fieldsOfStudy": ["Computer Science"],
    "citationCount": 102543,
    "referenceCount": 41,
    "influentialCitationCount": 12876,
    "doi": "10.48550/arXiv.1706.03762",
    "arxivId": "1706.03762",
    "isOpenAccess": true,
    "openAccessPdfUrl": "https://arxiv.org/pdf/1706.03762.pdf",
    "url": "https://www.semanticscholar.org/paper/204e3073870fae3d05bcbc2f6a8e263d9b72e776",
    "sourceQuery": "transformer architecture",
    "parentPaperId": null,
    "parentAuthorId": null,
    "relation": null,
    "scrapedAt": "2026-06-02T10:30:00.000Z"
}
```

##### Core Fields

| Field | Type | Description |
|-------|------|-------------|
| `recordType` | string | Always `"paper"` |
| `title` | string | Paper title |
| `abstract` | string | Abstract text (`null` when abstracts are off or unavailable) |
| `authors` | object\[] | Authors, each `{authorId, name}` |
| `year` | number | Publication year |
| `publicationDate` | string | ISO publication date when available |
| `venue` | string | Journal or conference name |
| `publicationTypes` | string\[] | E.g. `["JournalArticle"]` |
| `fieldsOfStudy` | string\[] | E.g. `["Computer Science"]` |

##### Identifiers

| Field | Type | Description |
|-------|------|-------------|
| `paperId` | string | Native 40-character Semantic Scholar paper ID |
| `doi` | string | DOI when available |
| `arxivId` | string | arXiv ID when available |
| `url` | string | Canonical Semantic Scholar paper page |

##### Metrics

| Field | Type | Description |
|-------|------|-------------|
| `citationCount` | number | Times this paper has been cited |
| `referenceCount` | number | Number of references in this paper |
| `influentialCitationCount` | number | Semantic Scholar's "influential" citation count |

##### Open Access & Lineage

| Field | Type | Description |
|-------|------|-------------|
| `isOpenAccess` | boolean | Whether a free open-access PDF exists |
| `openAccessPdfUrl` | string | Direct PDF link when open access |
| `sourceQuery` | string | The search query that produced this row (`null` for direct fetches) |
| `parentPaperId` | string | Source paper ID on citation/reference child rows (`null` on primary rows) |
| `parentAuthorId` | string | Source author ID on author-publication child rows (`null` on primary rows) |
| `relation` | string | `"citation"`, `"reference"`, or `"authorPaper"` on child rows (`null` on primary rows) |
| `scrapedAt` | string | ISO 8601 timestamp |

#### Author Profile (`recordType: "author"`)

```json
{
    "recordType": "author",
    "authorId": "1741101",
    "name": "Geoffrey E. Hinton",
    "affiliations": ["University of Toronto", "Google"],
    "homepage": "https://www.cs.toronto.edu/~hinton/",
    "paperCount": 412,
    "citationCount": 631204,
    "hIndex": 178,
    "url": "https://www.semanticscholar.org/author/1741101",
    "scrapedAt": "2026-06-02T10:30:00.000Z"
}
```

| Field | Type | Description |
|-------|------|-------------|
| `recordType` | string | Always `"author"` |
| `authorId` | string | Numeric Semantic Scholar author ID |
| `name` | string | Author display name |
| `affiliations` | string\[] | Listed affiliations |
| `homepage` | string | Homepage URL when available |
| `paperCount` | number | Number of papers attributed to the author |
| `citationCount` | number | Total citations across all papers |
| `hIndex` | number | Author h-index |
| `url` | string | Canonical Semantic Scholar author page |
| `scrapedAt` | string | ISO 8601 timestamp |

When `includeAuthorPapers` is on, each author's publications are also emitted as `paper` rows alongside the profile, so an author's full body of work lands in the Papers view ready to filter and sort.

### Tips for Best Results

- **Fetch by DOI or arXiv ID for guaranteed exact matches.** Keyword search is fuzzy; a DOI like `10.1038/nature14539` or `arXiv:1706.03762` resolves to exactly one paper, every time — perfect for verifying a known reference.
- **Narrow broad topics with filters.** A bare query like `"machine learning"` returns a flood. Add a `yearFrom`, a `fieldsOfStudy` value, and a `minCitationCount` to surface a tight, high-signal set.
- **Use `minCitationCount` for impact triage.** Set it to 50 or 100 to skip preprints and low-impact work when you only want established, well-cited literature.
- **Filter to `Review` and `MetaAnalysis` for evidence synthesis.** These publication types are the backbone of systematic reviews and save hours of manual screening.
- **Turn off abstracts on large harvests.** Setting `includeAbstracts: false` shrinks every row and speeds up runs when you only need metadata and metrics.
- **Keep `maxResults` modest when expanding the citation graph.** `includeCitations` and `includeReferences` multiply rows per paper — pair them with a small `maxResults` (5–20) and a sensible `maxCitationsPerPaper` to keep runs predictable.
- **Sort by `"Most recent first"` for monitoring.** Re-run a saved query on a schedule with date sorting to catch new publications in your field as they land.

### Pricing

**From $6 per 1,000 results** — the lowest-cost way to pull discipline-spanning academic data with citation graphs and author metrics bundled in. Bronze, Silver, and Gold subscribers pay progressively less; the table below shows total cost at each discount tier.

| Results | No discount | Bronze | Silver | Gold |
|---------|-------------|--------|--------|------|
| 100 | $0.72 | $0.68 | $0.64 | $0.60 |
| 1,000 | $7.20 | $6.80 | $6.40 | $6.00 |
| 10,000 | $72.00 | $68.00 | $64.00 | $60.00 |
| 100,000 | $720.00 | $680.00 | $640.00 | $600.00 |

No compute or time-based charges — you pay per result, plus a small fixed per-run start fee. A "result" is any row in the output dataset: a paper, an author profile, or a citing/referenced/author-paper row from the opt-in graph expansions (so enabling those expansions increases your result count). Platform fees depend on your Apify plan.

### Integrations

Export data in JSON, CSV, Excel, XML, or RSS. Connect to 1,500+ apps via:

- **Zapier** / **Make** / **n8n** — Workflow automation
- **Google Sheets** — Direct spreadsheet export
- **Slack** / **Email** — Notifications on new results
- **Webhooks** — Trigger custom APIs on run completion
- **Apify API** — Full programmatic access

### Legal & Ethical Use

This actor is designed for legitimate academic research, bibliometrics, literature review, and market intelligence. Users are responsible for complying with applicable laws and Semantic Scholar's terms of service, including making reasonable-rate requests and respecting content usage rules for any papers or PDFs linked from the dataset. Do not use extracted data for spam, harassment, or any illegal purpose. </content> </invoke>

# Actor input Schema

## `searchQueries` (type: `array`):

Keywords to search across paper titles and abstracts (e.g., 'large language models' or 'CRISPR gene editing'). Each query produces its own set of paper results. Leave empty if you only want to fetch specific papers or authors.

## `paperIds` (type: `array`):

Fetch specific papers directly. Accepts Semantic Scholar paper IDs (40-char hex), DOIs (e.g., '10.1038/nature14539'), arXiv IDs (e.g., 'arXiv:1706.03762'), or full Semantic Scholar paper URLs. One record is returned per paper.

## `authorIds` (type: `array`):

Semantic Scholar author IDs (numeric, e.g., '1741101') or full author profile URLs. Returns an author profile record (name, affiliation, h-index, citation count). Enable 'Include Author Papers' below to also pull each author's publications.

## `maxResults` (type: `integer`):

Maximum number of papers to return per search query. This is an exact cap — you are charged for at most this many results per query. Set to 0 for all available results (capped at 10,000 per query).

## `yearFrom` (type: `integer`):

Only include papers published in this year or later. Leave empty for no lower bound.

## `yearTo` (type: `integer`):

Only include papers published in this year or earlier. Leave empty for no upper bound.

## `fieldsOfStudy` (type: `array`):

Restrict results to one or more research fields. Leave empty to include all fields.

## `publicationTypes` (type: `array`):

Restrict results to one or more publication types (e.g., only peer-reviewed journal articles). Leave empty to include all types.

## `openAccessOnly` (type: `boolean`):

Only return papers that have a free, downloadable open-access PDF.

## `minCitationCount` (type: `integer`):

Only return papers cited at least this many times. Leave empty for no minimum. Useful for surfacing high-impact work.

## `sortBy` (type: `string`):

How to order paper search results. 'Relevance' returns Semantic Scholar's default result order (it is not a true relevance ranking — bulk search has no relevance score), 'Most cited first' sorts by citation count, and 'Most recent first' sorts by publication date.

## `includeAbstracts` (type: `boolean`):

Include the abstract text for each paper. Disable to reduce dataset size.

## `includeReferences` (type: `boolean`):

For each paper, also output the papers it cites (its reference list) as separate records. WARNING: a single paper can have hundreds of references — this can multiply your total result count and cost.

## `includeCitations` (type: `boolean`):

For each paper, also output the papers that cite it as separate records. WARNING: highly-cited papers can have tens of thousands of citing papers — this can dramatically multiply your total result count and cost. Use the cap below to bound it.

## `maxCitationsPerPaper` (type: `integer`):

When 'Include Citing Papers' or 'Include References' is on, this caps how many child papers are fetched per source paper. Default 50; higher values linearly increase runtime and cost.

## `includeAuthorPapers` (type: `boolean`):

When you provide author IDs/URLs, also output each author's publications as separate paper records. WARNING: prolific authors can have thousands of papers — this can multiply your total result count and cost.

## Actor input object example

```json
{
  "searchQueries": [
    "large language models"
  ],
  "paperIds": [],
  "authorIds": [],
  "maxResults": 100,
  "fieldsOfStudy": [],
  "publicationTypes": [],
  "openAccessOnly": false,
  "sortBy": "relevance",
  "includeAbstracts": true,
  "includeReferences": false,
  "includeCitations": false,
  "maxCitationsPerPaper": 50,
  "includeAuthorPapers": false
}
```

# Actor output Schema

## `papers` (type: `string`):

Table of papers with title, authors, year, venue, citations, and PDF link.

## `authors` (type: `string`):

Table of author profiles with name, affiliation, h-index, and citation counts.

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "searchQueries": [
        "large language models"
    ],
    "paperIds": [],
    "authorIds": [],
    "maxResults": 100,
    "fieldsOfStudy": [],
    "publicationTypes": [],
    "openAccessOnly": false,
    "sortBy": "relevance",
    "includeAbstracts": true,
    "includeReferences": false,
    "includeCitations": false,
    "maxCitationsPerPaper": 50,
    "includeAuthorPapers": false
};

// Run the Actor and wait for it to finish
const run = await client.actor("solidcode/semanticscholar-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = {
    "searchQueries": ["large language models"],
    "paperIds": [],
    "authorIds": [],
    "maxResults": 100,
    "fieldsOfStudy": [],
    "publicationTypes": [],
    "openAccessOnly": False,
    "sortBy": "relevance",
    "includeAbstracts": True,
    "includeReferences": False,
    "includeCitations": False,
    "maxCitationsPerPaper": 50,
    "includeAuthorPapers": False,
}

# Run the Actor and wait for it to finish
run = client.actor("solidcode/semanticscholar-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "searchQueries": [
    "large language models"
  ],
  "paperIds": [],
  "authorIds": [],
  "maxResults": 100,
  "fieldsOfStudy": [],
  "publicationTypes": [],
  "openAccessOnly": false,
  "sortBy": "relevance",
  "includeAbstracts": true,
  "includeReferences": false,
  "includeCitations": false,
  "maxCitationsPerPaper": 50,
  "includeAuthorPapers": false
}' |
apify call solidcode/semanticscholar-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=solidcode/semanticscholar-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Semantic Scholar Scraper",
        "description": "[💰 $6 / 1K] Extract academic papers, abstracts, citations, references, authors, and open-access PDF links from Semantic Scholar's 200M+ database. Search by keyword, paper ID/DOI/URL, or author. Filter by year, field, and citations. No API key.",
        "version": "1.0",
        "x-build-id": "zGQzgkvOuoM9F66Uk"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/solidcode~semanticscholar-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-solidcode-semanticscholar-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/solidcode~semanticscholar-scraper/runs": {
            "post": {
                "operationId": "runs-sync-solidcode-semanticscholar-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/solidcode~semanticscholar-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-solidcode-semanticscholar-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "searchQueries": {
                        "title": "Search Queries",
                        "type": "array",
                        "description": "Keywords to search across paper titles and abstracts (e.g., 'large language models' or 'CRISPR gene editing'). Each query produces its own set of paper results. Leave empty if you only want to fetch specific papers or authors.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "paperIds": {
                        "title": "Paper IDs, DOIs, or URLs",
                        "type": "array",
                        "description": "Fetch specific papers directly. Accepts Semantic Scholar paper IDs (40-char hex), DOIs (e.g., '10.1038/nature14539'), arXiv IDs (e.g., 'arXiv:1706.03762'), or full Semantic Scholar paper URLs. One record is returned per paper.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "authorIds": {
                        "title": "Author IDs or Profile URLs",
                        "type": "array",
                        "description": "Semantic Scholar author IDs (numeric, e.g., '1741101') or full author profile URLs. Returns an author profile record (name, affiliation, h-index, citation count). Enable 'Include Author Papers' below to also pull each author's publications.",
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxResults": {
                        "title": "Maximum Results per Query",
                        "minimum": 0,
                        "maximum": 10000,
                        "type": "integer",
                        "description": "Maximum number of papers to return per search query. This is an exact cap — you are charged for at most this many results per query. Set to 0 for all available results (capped at 10,000 per query).",
                        "default": 100
                    },
                    "yearFrom": {
                        "title": "Year From",
                        "minimum": 1900,
                        "maximum": 2100,
                        "type": "integer",
                        "description": "Only include papers published in this year or later. Leave empty for no lower bound."
                    },
                    "yearTo": {
                        "title": "Year To",
                        "minimum": 1900,
                        "maximum": 2100,
                        "type": "integer",
                        "description": "Only include papers published in this year or earlier. Leave empty for no upper bound."
                    },
                    "fieldsOfStudy": {
                        "title": "Fields of Study",
                        "uniqueItems": true,
                        "type": "array",
                        "description": "Restrict results to one or more research fields. Leave empty to include all fields.",
                        "items": {
                            "type": "string",
                            "enum": [
                                "Computer Science",
                                "Medicine",
                                "Chemistry",
                                "Biology",
                                "Materials Science",
                                "Physics",
                                "Geology",
                                "Psychology",
                                "Art",
                                "History",
                                "Geography",
                                "Sociology",
                                "Business",
                                "Political Science",
                                "Economics",
                                "Philosophy",
                                "Mathematics",
                                "Engineering",
                                "Environmental Science",
                                "Agricultural and Food Sciences",
                                "Education",
                                "Law",
                                "Linguistics"
                            ],
                            "enumTitles": [
                                "Computer Science",
                                "Medicine",
                                "Chemistry",
                                "Biology",
                                "Materials Science",
                                "Physics",
                                "Geology",
                                "Psychology",
                                "Art",
                                "History",
                                "Geography",
                                "Sociology",
                                "Business",
                                "Political Science",
                                "Economics",
                                "Philosophy",
                                "Mathematics",
                                "Engineering",
                                "Environmental Science",
                                "Agricultural and Food Sciences",
                                "Education",
                                "Law",
                                "Linguistics"
                            ]
                        }
                    },
                    "publicationTypes": {
                        "title": "Publication Types",
                        "uniqueItems": true,
                        "type": "array",
                        "description": "Restrict results to one or more publication types (e.g., only peer-reviewed journal articles). Leave empty to include all types.",
                        "items": {
                            "type": "string",
                            "enum": [
                                "Review",
                                "JournalArticle",
                                "CaseReport",
                                "ClinicalTrial",
                                "Conference",
                                "Dataset",
                                "Editorial",
                                "LettersAndComments",
                                "MetaAnalysis",
                                "News",
                                "Study",
                                "Book"
                            ],
                            "enumTitles": [
                                "Review",
                                "Journal Article",
                                "Case Report",
                                "Clinical Trial",
                                "Conference",
                                "Dataset",
                                "Editorial",
                                "Letters & Comments",
                                "Meta-Analysis",
                                "News",
                                "Study",
                                "Book"
                            ]
                        }
                    },
                    "openAccessOnly": {
                        "title": "Open Access PDFs Only",
                        "type": "boolean",
                        "description": "Only return papers that have a free, downloadable open-access PDF.",
                        "default": false
                    },
                    "minCitationCount": {
                        "title": "Minimum Citation Count",
                        "minimum": 0,
                        "type": "integer",
                        "description": "Only return papers cited at least this many times. Leave empty for no minimum. Useful for surfacing high-impact work."
                    },
                    "sortBy": {
                        "title": "Sort By",
                        "enum": [
                            "relevance",
                            "citationCount",
                            "publicationDate"
                        ],
                        "type": "string",
                        "description": "How to order paper search results. 'Relevance' returns Semantic Scholar's default result order (it is not a true relevance ranking — bulk search has no relevance score), 'Most cited first' sorts by citation count, and 'Most recent first' sorts by publication date.",
                        "default": "relevance"
                    },
                    "includeAbstracts": {
                        "title": "Include Abstracts",
                        "type": "boolean",
                        "description": "Include the abstract text for each paper. Disable to reduce dataset size.",
                        "default": true
                    },
                    "includeReferences": {
                        "title": "Include References",
                        "type": "boolean",
                        "description": "For each paper, also output the papers it cites (its reference list) as separate records. WARNING: a single paper can have hundreds of references — this can multiply your total result count and cost.",
                        "default": false
                    },
                    "includeCitations": {
                        "title": "Include Citing Papers",
                        "type": "boolean",
                        "description": "For each paper, also output the papers that cite it as separate records. WARNING: highly-cited papers can have tens of thousands of citing papers — this can dramatically multiply your total result count and cost. Use the cap below to bound it.",
                        "default": false
                    },
                    "maxCitationsPerPaper": {
                        "title": "Max Citing / Referenced Papers per Paper",
                        "minimum": 1,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "When 'Include Citing Papers' or 'Include References' is on, this caps how many child papers are fetched per source paper. Default 50; higher values linearly increase runtime and cost.",
                        "default": 50
                    },
                    "includeAuthorPapers": {
                        "title": "Include Author Papers",
                        "type": "boolean",
                        "description": "When you provide author IDs/URLs, also output each author's publications as separate paper records. WARNING: prolific authors can have thousands of papers — this can multiply your total result count and cost.",
                        "default": false
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
