# Google Scholar Scraper — Academic Papers & Citations (`muhammadafzal/google-scholar-scraper`) Actor

Extract academic paper titles, authors, abstracts, citation counts, publication details, and PDF links from Google Scholar. Fast, reliable, no browser overhead. Search by keyword, topic, or author name. MCP-optimized for AI agents.

- **URL**: https://apify.com/muhammadafzal/google-scholar-scraper.md
- **Developed by:** [Muhammad Afzal](https://apify.com/muhammadafzal) (community)
- **Categories:** AI, MCP servers
- **Stats:** 1 total users, 0 monthly users, 0.0% runs succeeded, NaN bookmarks
- **User rating**: No ratings yet

## Pricing

from $5.00 / 1,000 results

This Actor is paid per event and usage. You are charged both the fixed price for specific events and for Apify platform usage.

Learn more: https://docs.apify.com/platform/actors/running/actors-in-store#pay-per-event

## What's an Apify Actor?

Actors are a software tools running on the Apify platform, for all kinds of web data extraction and automation use cases.
In Batch mode, an Actor accepts a well-defined JSON input, performs an action which can take anything from a few seconds to a few hours,
and optionally produces a well-defined JSON output, datasets with results, or files in key-value store.
In Standby mode, an Actor provides a web server which can be used as a website, API, or an MCP server.
Actors are written with capital "A".

## How to integrate an Actor?

If asked about integration, you help developers integrate Actors into their projects.
You adapt to their stack and deliver integrations that are safe, well-documented, and production-ready.
The best way to integrate Actors is as follows.

In JavaScript/TypeScript projects, use official [JavaScript/TypeScript client](https://docs.apify.com/api/client/js.md):

```bash
npm install apify-client
```

In Python projects, use official [Python client library](https://docs.apify.com/api/client/python.md):

```bash
pip install apify-client
```

In shell scripts, use [Apify CLI](https://docs.apify.com/cli/docs.md):

````bash
# MacOS / Linux
curl -fsSL https://apify.com/install-cli.sh | bash
# Windows
irm https://apify.com/install-cli.ps1 | iex
```bash

In AI frameworks, you might use the [Apify MCP server](https://docs.apify.com/platform/integrations/mcp.md).

If your project is in a different language, use the [REST API](https://docs.apify.com/api/v2.md).

For usage examples, see the [API](#api) section below.

For more details, see Apify documentation as [Markdown index](https://docs.apify.com/llms.txt) and [Markdown full-text](https://docs.apify.com/llms-full.txt).


# README

## Google Scholar Scraper — Extract Academic Papers, Citations & Research Data

A powerful Google Scholar scraper that extracts academic paper metadata including titles, authors, abstracts, citation counts, publication details, and PDF links. Search by keyword, topic, or author name. Ideal for literature reviews, bibliometric analysis, and research data collection.

### Features

- **Lightning fast results** — 50 papers scraped in ~10 seconds
- **Rich academic metadata** — titles, authors, abstracts, citation counts, publication venues, PDF links, publication years
- **Citation tracking** — extract citation counts for impact analysis and research benchmarking
- **Year filtering** — narrow results by publication year range to focus on recent or historical research
- **Multi-query support** — search multiple keywords or topics in a single run
- **Author search** — find papers by specific researchers (e.g., "Geoffrey Hinton", "Yann LeCun")
- **Automatic pagination** — fetches up to 500 results per query with intelligent page handling
- **Structured JSON output** — clean, well-formatted data ready for analysis, databases, or AI pipelines

### Use Cases

- **Literature reviews** — collect papers systematically for academic research and systematic reviews
- **Bibliometric analysis** — measure research impact, track citation trends, analyze collaboration networks
- **Competitor intelligence** — monitor competitor research output and publication patterns
- **Grant writing** — find related work, citation context, and research gaps for proposals
- **AI & machine learning** — feed structured academic data into LLMs for summarization, classification, or knowledge graphs
- **Content creation** — generate research-backed articles, newsletters, and educational materials

### Input

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `searchQueries` | string[] | `["machine learning"]` | Keywords or topics to search on Google Scholar |
| `authorUrls` | string[] | `[]` | Author names to search (e.g., "Geoffrey Hinton") |
| `maxResults` | integer | `50` | Max papers per query (1–500) |
| `yearLow` | integer | `2000` | Minimum publication year for filtering |
| `yearHigh` | integer | `2026` | Maximum publication year for filtering |
| `sortBy` | string | `"relevance"` | Sort by `"relevance"` or `"date"` (newest first) |
| `articlesOnly` | boolean | `true` | Exclude patents and non-article results |

### Output

Each paper record includes 13 fields of structured metadata:

| Field | Type | Description |
|-------|------|-------------|
| `title` | string | Full academic paper title |
| `authors` | string[] | Author names parsed from publication metadata |
| `publicationInfo` | string | Journal, venue, year, and publisher details |
| `abstract` | string | Paper abstract or snippet from Google Scholar |
| `citationCount` | integer | Number of citations (from Google Scholar) |
| `paperUrl` | string | Direct link to the paper or landing page |
| `pdfUrl` | string\|null | Direct PDF download link when available |
| `sourceType` | string | Source type: `HTML`, `PDF`, or `BOOK` |
| `year` | integer | Publication year extracted from metadata |
| `citationsUrl` | string\|null | Link to papers citing this paper |
| `relatedUrl` | string\|null | Link to related articles on Google Scholar |
| `scrapedAt` | string | ISO 8601 timestamp of when data was scraped |
| `searchQuery` | string | The original search query that produced this result |

### Example Usage

#### Search by Topic

```json
{
    "searchQueries": ["deep learning cancer detection", "transformer architecture"],
    "maxResults": 100,
    "yearLow": 2020,
    "yearHigh": 2025,
    "sortBy": "relevance"
}
````

#### Search by Author Name

```json
{
    "authorUrls": ["Geoffrey Hinton", "Yann LeCun"],
    "maxResults": 50,
    "yearLow": 2015
}
```

#### Quick Test Run

```json
{
    "searchQueries": ["reinforcement learning"],
    "maxResults": 10
}
```

### Pricing

This actor uses a **pay-per-result pricing model at $0.005 per paper scraped**.

| Results | Cost |
|---------|------|
| 10 papers | $0.05 |
| 50 papers | $0.25 |
| 100 papers | $0.50 |
| 500 papers | $2.50 |

Fast, reliable, and cost-effective academic data extraction. No additional infrastructure or API keys required.

### Example Output

```json
{
    "title": "Deep learning",
    "authors": ["Y LeCun", "Y Bengio", "G Hinton"],
    "publicationInfo": "Nature, 2015 - nature.com",
    "abstract": "Deep learning allows computational models that are composed of multiple processing layers...",
    "citationCount": 86734,
    "paperUrl": "https://www.nature.com/articles/nature14539",
    "pdfUrl": null,
    "sourceType": "HTML",
    "year": 2015,
    "citationsUrl": null,
    "relatedUrl": null,
    "scrapedAt": "2026-05-03T08:53:26.141Z",
    "searchQuery": "deep learning"
}
```

### Why Use This Google Scholar Scraper?

- **No setup required** — works out of the box with zero configuration
- **No browser or proxy needed** — pure API-based extraction is faster and more reliable
- **Consistent structured data** — every record follows the same schema for easy processing
- **Built for scale** — handle hundreds of queries with automatic rate limiting and retries
- **AI-ready output** — clean JSON format perfect for feeding into LLMs, RAG pipelines, or data warehouses

# Actor input Schema

## `searchQueries` (type: `array`):

Enter one or more keywords, topics, or research areas to search on Google Scholar (e.g., 'machine learning', 'cancer immunotherapy'). Each query returns up to maxResults papers. Use authorUrls to search by author name instead.

## `authorUrls` (type: `array`):

Enter author names to find their most cited and relevant papers on Google Scholar (e.g., 'Geoffrey Hinton', 'Yann LeCun'). Returns papers authored by or citing these researchers.

## `maxResults` (type: `integer`):

Maximum number of papers to extract per search query. Default value is 50. Set lower for faster test runs, higher for comprehensive searches (up to 500 per query).

## `yearLow` (type: `integer`):

Only include papers published in or after this year. Set to 2020 to focus on recent research, or leave as 2000 for a wider range.

## `yearHigh` (type: `integer`):

Only include papers published in or before this year. Set to 2025 to exclude upcoming papers.

## `sortBy` (type: `string`):

Sort order for search results. Use 'relevance' for most relevant papers first. Use 'date' for newest papers first.

## `articlesOnly` (type: `boolean`):

When enabled, only includes actual articles — excludes patents and citations from results.

## Actor input object example

```json
{
  "searchQueries": [
    "cancer research"
  ],
  "authorUrls": [],
  "maxResults": 50,
  "yearLow": 2000,
  "yearHigh": 2026,
  "sortBy": "relevance",
  "articlesOnly": true
}
```

# API

You can run this Actor programmatically using our API. Below are code examples in JavaScript, Python, and CLI, as well as the OpenAPI specification and MCP server setup.

## JavaScript example

```javascript
import { ApifyClient } from 'apify-client';

// Initialize the ApifyClient with your Apify API token
// Replace the '<YOUR_API_TOKEN>' with your token
const client = new ApifyClient({
    token: '<YOUR_API_TOKEN>',
});

// Prepare Actor input
const input = {
    "searchQueries": [
        "cancer research"
    ]
};

// Run the Actor and wait for it to finish
const run = await client.actor("muhammadafzal/google-scholar-scraper").call(input);

// Fetch and print Actor results from the run's dataset (if any)
console.log('Results from dataset');
console.log(`💾 Check your data here: https://console.apify.com/storage/datasets/${run.defaultDatasetId}`);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
    console.dir(item);
});

// 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/js/docs

```

## Python example

```python
from apify_client import ApifyClient

# Initialize the ApifyClient with your Apify API token
# Replace '<YOUR_API_TOKEN>' with your token.
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "searchQueries": ["cancer research"] }

# Run the Actor and wait for it to finish
run = client.actor("muhammadafzal/google-scholar-scraper").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)

# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start

```

## CLI example

```bash
echo '{
  "searchQueries": [
    "cancer research"
  ]
}' |
apify call muhammadafzal/google-scholar-scraper --silent --output-dataset

```

## MCP server setup

```json
{
    "mcpServers": {
        "apify": {
            "command": "npx",
            "args": [
                "mcp-remote",
                "https://mcp.apify.com/?tools=muhammadafzal/google-scholar-scraper",
                "--header",
                "Authorization: Bearer <YOUR_API_TOKEN>"
            ]
        }
    }
}

```

## OpenAPI specification

```json
{
    "openapi": "3.0.1",
    "info": {
        "title": "Google Scholar Scraper — Academic Papers & Citations",
        "description": "Extract academic paper titles, authors, abstracts, citation counts, publication details, and PDF links from Google Scholar. Fast, reliable, no browser overhead. Search by keyword, topic, or author name. MCP-optimized for AI agents.",
        "version": "1.0",
        "x-build-id": "xgORI9WGm4aV5TlcY"
    },
    "servers": [
        {
            "url": "https://api.apify.com/v2"
        }
    ],
    "paths": {
        "/acts/muhammadafzal~google-scholar-scraper/run-sync-get-dataset-items": {
            "post": {
                "operationId": "run-sync-get-dataset-items-muhammadafzal-google-scholar-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        },
        "/acts/muhammadafzal~google-scholar-scraper/runs": {
            "post": {
                "operationId": "runs-sync-muhammadafzal-google-scholar-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor and returns information about the initiated run in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK",
                        "content": {
                            "application/json": {
                                "schema": {
                                    "$ref": "#/components/schemas/runsResponseSchema"
                                }
                            }
                        }
                    }
                }
            }
        },
        "/acts/muhammadafzal~google-scholar-scraper/run-sync": {
            "post": {
                "operationId": "run-sync-muhammadafzal-google-scholar-scraper",
                "x-openai-isConsequential": false,
                "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
                "tags": [
                    "Run Actor"
                ],
                "requestBody": {
                    "required": true,
                    "content": {
                        "application/json": {
                            "schema": {
                                "$ref": "#/components/schemas/inputSchema"
                            }
                        }
                    }
                },
                "parameters": [
                    {
                        "name": "token",
                        "in": "query",
                        "required": true,
                        "schema": {
                            "type": "string"
                        },
                        "description": "Enter your Apify token here"
                    }
                ],
                "responses": {
                    "200": {
                        "description": "OK"
                    }
                }
            }
        }
    },
    "components": {
        "schemas": {
            "inputSchema": {
                "type": "object",
                "properties": {
                    "searchQueries": {
                        "title": "Search Queries",
                        "type": "array",
                        "description": "Enter one or more keywords, topics, or research areas to search on Google Scholar (e.g., 'machine learning', 'cancer immunotherapy'). Each query returns up to maxResults papers. Use authorUrls to search by author name instead.",
                        "default": [
                            "machine learning"
                        ],
                        "items": {
                            "type": "string"
                        }
                    },
                    "authorUrls": {
                        "title": "Author Names",
                        "type": "array",
                        "description": "Enter author names to find their most cited and relevant papers on Google Scholar (e.g., 'Geoffrey Hinton', 'Yann LeCun'). Returns papers authored by or citing these researchers.",
                        "default": [],
                        "items": {
                            "type": "string"
                        }
                    },
                    "maxResults": {
                        "title": "Max Results",
                        "minimum": 1,
                        "maximum": 500,
                        "type": "integer",
                        "description": "Maximum number of papers to extract per search query. Default value is 50. Set lower for faster test runs, higher for comprehensive searches (up to 500 per query).",
                        "default": 50
                    },
                    "yearLow": {
                        "title": "Minimum Year",
                        "minimum": 1900,
                        "maximum": 2026,
                        "type": "integer",
                        "description": "Only include papers published in or after this year. Set to 2020 to focus on recent research, or leave as 2000 for a wider range.",
                        "default": 2000
                    },
                    "yearHigh": {
                        "title": "Maximum Year",
                        "minimum": 1900,
                        "maximum": 2026,
                        "type": "integer",
                        "description": "Only include papers published in or before this year. Set to 2025 to exclude upcoming papers.",
                        "default": 2026
                    },
                    "sortBy": {
                        "title": "Sort By",
                        "enum": [
                            "relevance",
                            "date"
                        ],
                        "type": "string",
                        "description": "Sort order for search results. Use 'relevance' for most relevant papers first. Use 'date' for newest papers first.",
                        "default": "relevance"
                    },
                    "articlesOnly": {
                        "title": "Articles Only",
                        "type": "boolean",
                        "description": "When enabled, only includes actual articles — excludes patents and citations from results.",
                        "default": true
                    }
                }
            },
            "runsResponseSchema": {
                "type": "object",
                "properties": {
                    "data": {
                        "type": "object",
                        "properties": {
                            "id": {
                                "type": "string"
                            },
                            "actId": {
                                "type": "string"
                            },
                            "userId": {
                                "type": "string"
                            },
                            "startedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "finishedAt": {
                                "type": "string",
                                "format": "date-time",
                                "example": "2025-01-08T00:00:00.000Z"
                            },
                            "status": {
                                "type": "string",
                                "example": "READY"
                            },
                            "meta": {
                                "type": "object",
                                "properties": {
                                    "origin": {
                                        "type": "string",
                                        "example": "API"
                                    },
                                    "userAgent": {
                                        "type": "string"
                                    }
                                }
                            },
                            "stats": {
                                "type": "object",
                                "properties": {
                                    "inputBodyLen": {
                                        "type": "integer",
                                        "example": 2000
                                    },
                                    "rebootCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "restartCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "resurrectCount": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "computeUnits": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "options": {
                                "type": "object",
                                "properties": {
                                    "build": {
                                        "type": "string",
                                        "example": "latest"
                                    },
                                    "timeoutSecs": {
                                        "type": "integer",
                                        "example": 300
                                    },
                                    "memoryMbytes": {
                                        "type": "integer",
                                        "example": 1024
                                    },
                                    "diskMbytes": {
                                        "type": "integer",
                                        "example": 2048
                                    }
                                }
                            },
                            "buildId": {
                                "type": "string"
                            },
                            "defaultKeyValueStoreId": {
                                "type": "string"
                            },
                            "defaultDatasetId": {
                                "type": "string"
                            },
                            "defaultRequestQueueId": {
                                "type": "string"
                            },
                            "buildNumber": {
                                "type": "string",
                                "example": "1.0.0"
                            },
                            "containerUrl": {
                                "type": "string"
                            },
                            "usage": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "integer",
                                        "example": 1
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            },
                            "usageTotalUsd": {
                                "type": "number",
                                "example": 0.00005
                            },
                            "usageUsd": {
                                "type": "object",
                                "properties": {
                                    "ACTOR_COMPUTE_UNITS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATASET_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "KEY_VALUE_STORE_WRITES": {
                                        "type": "number",
                                        "example": 0.00005
                                    },
                                    "KEY_VALUE_STORE_LISTS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_READS": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "REQUEST_QUEUE_WRITES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_INTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "DATA_TRANSFER_EXTERNAL_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
                                        "type": "integer",
                                        "example": 0
                                    },
                                    "PROXY_SERPS": {
                                        "type": "integer",
                                        "example": 0
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}
```
