# Google Scholar Scraper: Papers, Authors, Citations, BibTeX (`scrapemint/google-scholar-scraper`) Actor

Search Google Scholar at scale. Pulls paper metadata, author affiliations, h-index, cited by counts, citing paper lists, BibTeX, and PDF links. One row per paper. Pay per row.

- **URL**: https://apify.com/scrapemint/google-scholar-scraper.md
- **Developed by:** [Kennedy Mutisya](https://apify.com/scrapemint) (community)
- **Categories:** Other
- **Stats:** 1 total users, 0 monthly users, 100.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage, which gets cheaper the higher subscription plan you have.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-usage

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Google Scholar Scraper: Papers, Authors, Citations, BibTeX

Scrape Google Scholar at scale. Pulls paper metadata (title, authors, year, venue, snippet), author profile data (affiliation, h-index, i10-index, total citations), citing paper lists, full BibTeX exports, all-versions clusters, and PDF links. One row per paper. Pay per row.

**Built for** academic researchers running literature reviews, PhD students chasing prior work, patent attorneys hunting prior art, bibliometricians measuring institutional output, science journalists tracing claims, AI teams building research copilots and training corpora, librarians enriching catalogs, and grant writers finding precedent.

**Keywords this actor ranks for:** google scholar api, google scholar scraper, scholar search api, academic paper scraper, citation count scraper, h-index lookup, prior art search, bibliometrics api, literature review automation, paper metadata extractor, BibTeX scraper, citing papers list, scholar author profile, research paper api.

---

### Why this actor

| Other Scholar tools | **This actor** |
|---|---|
| SerpAPI Google Scholar engine: $75 / month for 5K searches | Pay per row scraped. No monthly minimum. |
| Semantic Scholar API: free but covers a smaller corpus | Walks the live Google Scholar index, broader coverage |
| OpenAlex: free but uses Crossref + MAG snapshots, lags behind | Live page parse, fresh citation counts |
| scholarly Python lib: breaks on Scholar HTML changes, no proxy | Maintained selectors plus residential proxy out of the box |
| One result format (paper or author) | Mixed seed types in one run: queries, author URLs, cluster IDs, paper URLs |
| No author enrichment | Optional fetchAuthorProfiles flag adds h-index, i10, affiliation per row |
| No citing papers | Optional fetchCitedBy flag pulls the citing paper list per source paper |
| No BibTeX | Optional fetchBibtex flag attaches the BibTeX export per row |

---

### How it works

```mermaid
flowchart LR
    A[Queries<br/>or Author URLs<br/>or Cluster IDs<br/>or Paper URLs] --> B[Seed router]
    B --> C[Search pages<br/>scholar?q=...]
    B --> D[Author pages<br/>citations?user=...]
    B --> E[Cluster pages<br/>scholar?cluster=...]
    C --> F[Parse result blocks<br/>div.gs_r.gs_or.gs_scl]
    D --> G[Parse profile + papers table]
    E --> F
    F --> H{Enrichment toggles?}
    H -->|fetchAuthorProfiles| I[Queue author URL]
    H -->|fetchCitedBy| J[Queue cites=cluster]
    H -->|fetchBibtex| K[Open cite modal,<br/>follow BibTeX link]
    H -->|fetchVersions| L[Queue cluster=cluster]
    I --> G
    J --> M[Walk citing papers]
    F --> N[(One row per paper)]
    G --> N
    M --> N
````

Scholar is fingerprinted aggressively against datacenter IPs. The actor runs Playwright with bundled Chromium, defaults to Apify residential proxy, and paces requests with `navigationDelayMs` so the session looks like a careful human reader rather than a burst client.

***

### What you get per row

```mermaid
flowchart LR
    R[Paper row] --> R1[Identity<br/>title scholarClusterId url]
    R --> R2[Authors<br/>parsed names + profile links]
    R --> R3[Year + venue<br/>+ publisher]
    R --> R4[Snippet<br/>first ~250 chars]
    R --> R5[Citations<br/>citedByCount + citedByUrl]
    R --> R6[Versions<br/>versionCount + versionsUrl]
    R --> R7[PDF<br/>pdfUrl + pdfLabel]
    R --> R8[Optional<br/>bibtex string]
    R --> R9[Optional<br/>authorProfileLinks enriched]
```

Cluster ID is Scholar's stable identifier for a paper across reprints, preprints, and repository copies. Use it to dedupe across runs (built in via `dedupe: true`) and to fetch the citing paper list.

***

### Quick start

**Literature review on a topic, last 3 years**

```json
{
  "queries": ["graph neural network drug discovery"],
  "yearFrom": 2023,
  "sortBy": "relevance",
  "maxPapers": 100,
  "maxPagesPerQuery": 10
}
```

**One author's full publication record**

```json
{
  "authorUrls": [
    "https://scholar.google.com/citations?user=JicYPdAAAAAJ"
  ]
}
```

**High citation papers with citing list, ready for impact analysis**

```json
{
  "queries": ["transformer language model"],
  "yearFrom": 2017,
  "yearTo": 2020,
  "fetchCitedBy": true,
  "minCitationsForCitedBy": 1000,
  "maxCitedByPapers": 50,
  "maxPapers": 25
}
```

**Prior art sweep with patents included**

```json
{
  "queries": ["lithium iron phosphate cathode coating"],
  "includePatents": true,
  "yearFrom": 2010,
  "fetchBibtex": true,
  "maxPapers": 200
}
```

**Build a BibTeX library from a topic**

```json
{
  "queries": ["retrieval augmented generation"],
  "yearFrom": 2020,
  "fetchBibtex": true,
  "maxPapers": 50
}
```

**All Scholar versions of a single paper (preprint + published + repository copies)**

```json
{
  "clusterIds": ["17784817748666649498"]
}
```

***

### Sample output

```json
{
  "title": "Attention Is All You Need",
  "url": "https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa.html",
  "scholarClusterId": "2960712678066186980",
  "authors": ["A Vaswani", "N Shazeer", "N Parmar", "J Uszkoreit", "L Jones"],
  "authorProfileLinks": [
    { "name": "A Vaswani", "url": "https://scholar.google.com/citations?user=oR9V4YkAAAAJ" }
  ],
  "year": 2017,
  "venue": "Advances in neural information processing systems",
  "publisher": "papers.nips.cc",
  "snippet": "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks...",
  "citedByCount": 142318,
  "citedByUrl": "https://scholar.google.com/scholar?cites=2960712678066186980",
  "versionCount": 38,
  "versionsUrl": "https://scholar.google.com/scholar?cluster=2960712678066186980",
  "relatedUrl": "https://scholar.google.com/scholar?q=related:abc/scholar",
  "pdfUrl": "https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf",
  "pdfLabel": "[PDF] neurips.cc",
  "bibtex": "@inproceedings{vaswani2017attention,\n  title={Attention is all you need},\n  author={Vaswani, Ashish and ...},\n  booktitle={Advances in Neural Information Processing Systems},\n  year={2017}\n}",
  "scrapedAt": "2026-04-29T11:30:00.000Z"
}
```

Author rows ship with `type: "author"` and the full profile + papers table:

```json
{
  "type": "author",
  "name": "Geoffrey Hinton",
  "affiliation": "Emeritus Prof. Computer Science, University of Toronto",
  "verifiedEmailDomain": "cs.toronto.edu",
  "homepage": "http://www.cs.toronto.edu/~hinton",
  "interests": ["machine learning", "psychology", "artificial intelligence", "cognitive science"],
  "stats": {
    "totalCitations": 802145,
    "citationsSince5Years": 412338,
    "hIndex": 174,
    "hIndexSince5Years": 134,
    "i10Index": 470,
    "i10IndexSince5Years": 350
  },
  "papersCount": 451,
  "papers": [
    { "title": "Deep learning", "authors": "Y LeCun, Y Bengio, G Hinton", "venue": "Nature", "year": 2015, "citedBy": 89243 }
  ]
}
```

***

### Who uses this

| Role | Use case |
|---|---|
| Academic researcher | Build a literature review feed for a thesis or grant proposal. Track new citations on key papers daily. |
| PhD student | Find prior work on your method. Pull author h-index to gauge a venue's signal. |
| Patent attorney | Prior art sweep across journals + conferences + patents. Export BibTeX into the prior art docket. |
| Bibliometrician | Measure institutional or country level output. Walk every author profile under one institution. |
| AI / LLM team | Build research copilot training data. Pull citing papers to construct citation graphs. |
| Science journalist | Trace a viral claim back to the primary source. Verify how cited it actually is. |
| Librarian | Enrich an institutional repository with venue + citation counts on every paper. |
| Grant writer | Cite the seminal works in your field with accurate counts. Find precedent across funders. |
| Reference manager | Replace SerpAPI's Scholar engine. Same data, no monthly minimum. |

***

### Input reference

| Field | Type | What it does |
|---|---|---|
| `queries` | string\[] | Free text Scholar queries. Supports operators: "exact", author:Hinton, intitle:transformer. |
| `authorUrls` | string\[] | Direct Scholar citations profile URLs. Returns the author's full publication record. |
| `clusterIds` | string\[] | Scholar cluster IDs. Use to fetch all versions of one paper. |
| `paperUrls` | string\[] | Direct Scholar result URLs to enrich. Useful when you already have a list. |
| `yearFrom` / `yearTo` | integer | Publication year window. 0 means no bound. |
| `sortBy` | enum | relevance (default) or date (newest first). |
| `language` | enum | Scholar interface language. Affects venue parsing. |
| `includePatents` | boolean | Include patent results. Off by default. |
| `includeCaseLaw` | boolean | Include legal case law. Off by default. |
| `fetchAuthorProfiles` | boolean | Per paper, fetch each author's profile (h-index, affiliation). One extra request per unique author. |
| `fetchCitedBy` | boolean | Per paper above the citation threshold, walk the citing papers list. |
| `minCitationsForCitedBy` | integer | Threshold for triggering cited by fetch. Avoids wasting requests on low cited papers. |
| `maxCitedByPapers` | integer | Cap on how many citing papers to collect per source paper. |
| `fetchBibtex` | boolean | Pull BibTeX export per paper. |
| `fetchVersions` | boolean | Pull every Scholar cluster version (preprint, published, repository copies). |
| `maxPapers` | integer | Hard cap on rows per run. 0 means unlimited. |
| `maxPagesPerQuery` | integer | Pages of 10 results per query. Scholar caps at 100. |
| `dedupe` | boolean | Skip cluster IDs from previous runs. |
| `navigationDelayMs` | integer | Pause between page loads. 4000 to 8000 ms is the safe band. |
| `concurrency` | integer | Parallel browser pages. Keep at 1 to 2 unless you have a residential pool. |
| `proxyConfiguration` | object | Apify proxy. Residential strongly recommended. |

***

### API call

```bash
curl -X POST \
  "https://api.apify.com/v2/acts/YOUR_USER~google-scholar-scraper/runs?token=YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "queries": ["large language model alignment"],
    "yearFrom": 2022,
    "fetchAuthorProfiles": true,
    "fetchBibtex": true,
    "maxPapers": 50,
    "maxPagesPerQuery": 5
  }'
```

***

### Pricing

The first few rows per run are free so you can validate the schema before paying. After that, one charge per paper row regardless of how many enrichment fields you turn on. Author profile rows count as one row each. BibTeX, citing papers, and version fetches are included at no extra per row charge.

***

### FAQ

#### Why does this need a residential proxy?

Google Scholar fingerprints datacenter IP ranges hard. Five queries from a datacenter IP triggers a CAPTCHA. The actor defaults to Apify residential proxy, which rotates per request and matches a real user fingerprint.

#### What is a cluster ID?

Scholar groups every version of a paper (preprint on arXiv, published version, university repository copy) under one cluster ID. The actor exposes it as `scholarClusterId` so you can dedupe across runs and fetch versions or citations on demand.

#### Can I get the full citation graph?

Yes, in two passes. First pass: search your topic with `fetchCitedBy: true`. Each paper ships with a `citingPapers[]` list. Second pass: feed those citing paper cluster IDs back in as `clusterIds` to walk one more level deep. Two passes give you a complete one hop neighborhood for ~50 seed papers.

#### Does it respect Scholar's rate limits?

The default `navigationDelayMs` of 4500 paces requests at roughly the speed of an attentive human reader. Scholar will still throttle aggressive concurrency. Keep `concurrency` at 1 or 2 unless you have a wide residential proxy pool.

#### How is this different from SerpAPI's Scholar engine?

SerpAPI charges $75 / month for 5,000 searches and ships a flattened result schema. This actor charges per row scraped (no monthly floor), exposes the full result block including cluster ID, version count, and PDF labels, and lets you mix queries with author profiles and cluster fetches in one run.

#### How is this different from Semantic Scholar API?

Semantic Scholar's free API is excellent but covers Semantic Scholar's own indexed corpus, which is smaller than Google Scholar's. Use Semantic Scholar for breadth in CS / biomedical, use this actor when you need the long tail Scholar covers (humanities, social sciences, regional venues, working papers).

#### Will it find papers behind a paywall?

The result row always includes Scholar's metadata (title, authors, citation count, abstract snippet) regardless of access. The `pdfUrl` field is populated only when Scholar finds a free hosted copy (preprint server, repository, author page). For the actual PDF text, use Apify's Website Content Crawler against the `pdfUrl`.

#### Can I track citation changes over time?

Yes. Schedule the actor on a daily cron with the same query and `dedupe: false`. Each row carries `scrapedAt`. Diff `citedByCount` between snapshots to track citation velocity.

#### Does fetchAuthorProfiles work for every author?

Only authors who have set up a Scholar profile have a profile link. The actor follows links found on the result block. Authors without a profile ship as a name string in the `authors` array with no profile URL.

#### Will I get blocked?

The actor avoids the most common detection signals (datacenter IPs, missing user agent, no delays). Scholar still occasionally throws a CAPTCHA. The actor logs and retries with a fresh proxy session. If you see repeated CAPTCHA errors, raise `navigationDelayMs` to 8000 and drop `concurrency` to 1.

***

### Related actors

- **SEC 8-K Event Tracker**. Same temporal shape applied to corporate disclosures.
- **SEC Form 4 Insider Tracker**. Daily insider trades from the same SEC EDGAR pipeline.
- **GitHub Issue Monitor**. Triage filter applied to open source repos. Pairs with Scholar to map paper to code.
- **Website Content Crawler**. Pipe `pdfUrl` from each Scholar row into the crawler for full text extraction.
- **HN Lead Monitor**. Catch new mentions of any paper or author on Hacker News.
- **Reddit Lead Monitor**. Same applied to Reddit, useful for tracking social discussion of a paper.

# Actor input Schema

## `queries` (type: `array`):

Free text queries. Supports Scholar operators: "exact phrase", author:Hinton, intitle:transformer. Example: \["large language model", "author:LeCun convolutional"].

## `authorUrls` (type: `array`):

Google Scholar citations profile URLs. Example: https://scholar.google.com/citations?user=JicYPdAAAAAJ. Returns the author's full publication list.

## `clusterIds` (type: `array`):

Scholar cluster IDs to fetch all versions of one paper, or its citing papers. Example: 17784817748666649498.

## `paperUrls` (type: `array`):

Scholar result URLs to enrich with full metadata (BibTeX, citing papers, related). Useful when you already have a list of papers and need citations.

## `yearFrom` (type: `integer`):

Lower bound on publication year. 0 means no lower bound.

## `yearTo` (type: `integer`):

Upper bound on publication year. 0 means no upper bound.

## `sortBy` (type: `string`):

Relevance is Scholar's default. Date sorts newest first.

## `language` (type: `string`):

Scholar interface language. Affects venue parsing and citation labels.

## `includePatents` (type: `boolean`):

Include patent results. Off by default to keep results to peer reviewed papers.

## `includeCaseLaw` (type: `boolean`):

Include legal case law results. Off by default.

## `fetchAuthorProfiles` (type: `boolean`):

For every paper with at least one author profile linked, fetch the author's affiliation, h-index, i10-index, and total citations. Adds one extra request per unique author.

## `fetchCitedBy` (type: `boolean`):

For papers with N or more citations, fetch the list of papers that cite this work. Adds citingPapers\[] to the row.

## `minCitationsForCitedBy` (type: `integer`):

Only fetch citing papers when the paper has at least this many citations. Avoids wasting requests on low cited papers.

## `maxCitedByPapers` (type: `integer`):

Cap on how many citing papers to fetch per source paper. Scholar returns 10 per page.

## `fetchBibtex` (type: `boolean`):

Pull the BibTeX export for every paper. Adds bibtex (string) to the row. Adds one extra request per paper.

## `fetchVersions` (type: `boolean`):

For each paper, fetch every Scholar cluster version (preprint, published, repository copies). Adds versions\[] to the row.

## `maxPapers` (type: `integer`):

Hard cap on paper rows pushed per run. 0 means unlimited (Scholar caps at ~1000 results per query).

## `maxPagesPerQuery` (type: `integer`):

Pages of 10 results to walk per query. Scholar caps at 100 pages (1000 results) per query and aggressively rate limits past page 30.

## `dedupe` (type: `boolean`):

Skip cluster IDs already pushed in previous runs. Keyed on Scholar cluster ID. Turn off to refresh stale rows.

## `navigationDelayMs` (type: `integer`):

Pause between page loads. Scholar throttles fast scraping. 4000 to 8000 ms is the safe range.

## `concurrency` (type: `integer`):

Parallel browser pages. Keep at 1 to 2 unless you have a residential proxy pool.

## `proxyConfiguration` (type: `object`):

Apify proxy. Scholar aggressively blocks datacenter IPs. RESIDENTIAL strongly recommended.

## Actor input object example

```json
{
  "queries": [],
  "authorUrls": [],
  "clusterIds": [],
  "paperUrls": [],
  "yearFrom": 0,
  "yearTo": 0,
  "sortBy": "relevance",
  "language": "en",
  "includePatents": false,
  "includeCaseLaw": false,
  "fetchAuthorProfiles": false,
  "fetchCitedBy": false,
  "minCitationsForCitedBy": 50,
  "maxCitedByPapers": 20,
  "fetchBibtex": false,
  "fetchVersions": false,
  "maxPapers": 100,
  "maxPagesPerQuery": 10,
  "dedupe": true,
  "navigationDelayMs": 4500,
  "concurrency": 1,
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ]
  }
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "proxyConfiguration": {
        "useApifyProxy": true,
        "apifyProxyGroups": [
            "RESIDENTIAL"
        ]
    }
};

// Run the Actor and wait for it to finish
const run = await client.actor("scrapemint/google-scholar-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "proxyConfiguration": {
        "useApifyProxy": True,
        "apifyProxyGroups": ["RESIDENTIAL"],
    } }

# Run the Actor and wait for it to finish
run = client.actor("scrapemint/google-scholar-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "proxyConfiguration": {
    "useApifyProxy": true,
    "apifyProxyGroups": [
      "RESIDENTIAL"
    ]
  }
}' |
apify call scrapemint/google-scholar-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=scrapemint/google-scholar-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Google Scholar Scraper: Papers, Authors, Citations, BibTeX",
        "description": "Search Google Scholar at scale. Pulls paper metadata, author affiliations, h-index, cited by counts, citing paper lists, BibTeX, and PDF links. One row per paper. Pay per row.",
        "version": "0.1",
        "x-build-id": "y0pJJgwuPmr7p3ORV"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/scrapemint~google-scholar-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-scrapemint-google-scholar-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/scrapemint~google-scholar-scraper/runs": {
            "post": {
                "operationId": "runs-sync-scrapemint-google-scholar-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/scrapemint~google-scholar-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-scrapemint-google-scholar-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "queries": {
                        "title": "Search queries",
                        "type": "array",
                        "description": "Free text queries. Supports Scholar operators: \"exact phrase\", author:Hinton, intitle:transformer. Example: [\"large language model\", \"author:LeCun convolutional\"].",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "authorUrls": {
                        "title": "Author profile URLs",
                        "type": "array",
                        "description": "Google Scholar citations profile URLs. Example: https://scholar.google.com/citations?user=JicYPdAAAAAJ. Returns the author's full publication list.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "clusterIds": {
                        "title": "Cluster IDs",
                        "type": "array",
                        "description": "Scholar cluster IDs to fetch all versions of one paper, or its citing papers. Example: 17784817748666649498.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "paperUrls": {
                        "title": "Direct paper URLs",
                        "type": "array",
                        "description": "Scholar result URLs to enrich with full metadata (BibTeX, citing papers, related). Useful when you already have a list of papers and need citations.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "yearFrom": {
                        "title": "Published from year",
                        "minimum": 0,
                        "maximum": 2100,
                        "type": "integer",
                        "description": "Lower bound on publication year. 0 means no lower bound.",
                        "default": 0
                    },
                    "yearTo": {
                        "title": "Published until year",
                        "minimum": 0,
                        "maximum": 2100,
                        "type": "integer",
                        "description": "Upper bound on publication year. 0 means no upper bound.",
                        "default": 0
                    },
                    "sortBy": {
                        "title": "Sort order",
                        "enum": [
                            "relevance",
                            "date"
                        ],
                        "type": "string",
                        "description": "Relevance is Scholar's default. Date sorts newest first.",
                        "default": "relevance"
                    },
                    "language": {
                        "title": "Language",
                        "enum": [
                            "en",
                            "es",
                            "de",
                            "fr",
                            "it",
                            "pt-BR",
                            "ja",
                            "zh-CN"
                        ],
                        "type": "string",
                        "description": "Scholar interface language. Affects venue parsing and citation labels.",
                        "default": "en"
                    },
                    "includePatents": {
                        "title": "Include patents",
                        "type": "boolean",
                        "description": "Include patent results. Off by default to keep results to peer reviewed papers.",
                        "default": false
                    },
                    "includeCaseLaw": {
                        "title": "Include case law",
                        "type": "boolean",
                        "description": "Include legal case law results. Off by default.",
                        "default": false
                    },
                    "fetchAuthorProfiles": {
                        "title": "Enrich with author profiles",
                        "type": "boolean",
                        "description": "For every paper with at least one author profile linked, fetch the author's affiliation, h-index, i10-index, and total citations. Adds one extra request per unique author.",
                        "default": false
                    },
                    "fetchCitedBy": {
                        "title": "Fetch citing papers",
                        "type": "boolean",
                        "description": "For papers with N or more citations, fetch the list of papers that cite this work. Adds citingPapers[] to the row.",
                        "default": false
                    },
                    "minCitationsForCitedBy": {
                        "title": "Minimum citations to fetch cited by",
                        "minimum": 0,
                        "maximum": 1000000,
                        "type": "integer",
                        "description": "Only fetch citing papers when the paper has at least this many citations. Avoids wasting requests on low cited papers.",
                        "default": 50
                    },
                    "maxCitedByPapers": {
                        "title": "Max citing papers per source paper",
                        "minimum": 0,
                        "maximum": 1000,
                        "type": "integer",
                        "description": "Cap on how many citing papers to fetch per source paper. Scholar returns 10 per page.",
                        "default": 20
                    },
                    "fetchBibtex": {
                        "title": "Fetch BibTeX",
                        "type": "boolean",
                        "description": "Pull the BibTeX export for every paper. Adds bibtex (string) to the row. Adds one extra request per paper.",
                        "default": false
                    },
                    "fetchVersions": {
                        "title": "Fetch all versions",
                        "type": "boolean",
                        "description": "For each paper, fetch every Scholar cluster version (preprint, published, repository copies). Adds versions[] to the row.",
                        "default": false
                    },
                    "maxPapers": {
                        "title": "Total maximum papers",
                        "minimum": 0,
                        "maximum": 50000,
                        "type": "integer",
                        "description": "Hard cap on paper rows pushed per run. 0 means unlimited (Scholar caps at ~1000 results per query).",
                        "default": 100
                    },
                    "maxPagesPerQuery": {
                        "title": "Max pages per query",
                        "minimum": 1,
                        "maximum": 100,
                        "type": "integer",
                        "description": "Pages of 10 results to walk per query. Scholar caps at 100 pages (1000 results) per query and aggressively rate limits past page 30.",
                        "default": 10
                    },
                    "dedupe": {
                        "title": "Deduplicate across runs",
                        "type": "boolean",
                        "description": "Skip cluster IDs already pushed in previous runs. Keyed on Scholar cluster ID. Turn off to refresh stale rows.",
                        "default": true
                    },
                    "navigationDelayMs": {
                        "title": "Delay between navigations (ms)",
                        "minimum": 0,
                        "maximum": 60000,
                        "type": "integer",
                        "description": "Pause between page loads. Scholar throttles fast scraping. 4000 to 8000 ms is the safe range.",
                        "default": 4500
                    },
                    "concurrency": {
                        "title": "Concurrency",
                        "minimum": 1,
                        "maximum": 8,
                        "type": "integer",
                        "description": "Parallel browser pages. Keep at 1 to 2 unless you have a residential proxy pool.",
                        "default": 1
                    },
                    "proxyConfiguration": {
                        "title": "Proxy configuration",
                        "type": "object",
                        "description": "Apify proxy. Scholar aggressively blocks datacenter IPs. RESIDENTIAL strongly recommended.",
                        "default": {
                            "useApifyProxy": true,
                            "apifyProxyGroups": [
                                "RESIDENTIAL"
                            ]
                        }
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
