Google Scholar Scraper avatar

Google Scholar Scraper

Pricing

Pay per event

Go to Apify Store
Google Scholar Scraper

Google Scholar Scraper

Search Google Scholar and extract academic papers โ€” titles, authors, citations, abstracts, year, PDF links, and publication details. Pure HTTP, fast and cheap.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Stas Persiianenko

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

7 hours ago

Last modified

Share

Search Google Scholar and extract academic papers with full metadata. Returns titles, authors, citation counts, abstracts, PDF links, publication sources, and more.

Pure HTTP scraper โ€” no browser needed. Fast, cheap, and reliable.

What does Google Scholar Scraper do?

Google Scholar Scraper searches Google Scholar and extracts structured data from academic search results. For each paper, it returns the title, authors, publication year, citation count, abstract snippet, PDF link, and publication source.

  • ๐Ÿ” Search by query โ€” supports Google Scholar syntax: exact phrases, author search, exclusions, OR/AND operators
  • ๐Ÿ“… Year filtering โ€” restrict results to a specific date range
  • ๐Ÿ“Š Sort by relevance or date โ€” newest first or most relevant
  • ๐Ÿ“„ PDF link extraction โ€” direct links to PDF/full-text when available
  • ๐Ÿ“ˆ Citation data โ€” citation counts and links to citing papers
  • ๐Ÿ”— Direct URL support โ€” paste any Google Scholar search URL to scrape it
  • ๐Ÿ“‘ Pagination โ€” automatically follows pages up to your maxResults limit

Who is Google Scholar Scraper for?

  • ๐ŸŽ“ Academic researchers building literature reviews and meta-analyses
  • ๐Ÿงฌ R&D teams tracking publications in their domain
  • ๐Ÿ“Š Bibliometricians analyzing citation networks and research trends
  • ๐Ÿค– AI/ML engineers curating training datasets from scientific papers
  • ๐Ÿ“ฐ Science journalists finding expert sources and trending research
  • ๐Ÿข Patent analysts monitoring prior art and competitor research
  • ๐Ÿ“š Librarians building curated reading lists and research guides

Why use Google Scholar Scraper?

  • โšก No browser overhead โ€” pure HTTP requests mean fast execution and low cost
  • ๐Ÿ”„ Automated pagination โ€” scrapes hundreds of results across multiple pages
  • ๐Ÿ“‹ Structured output โ€” clean JSON with all metadata fields ready for analysis
  • ๐Ÿ’ฐ Pay per result โ€” only pay for the papers you actually extract
  • ๐Ÿ”— PDF links included โ€” direct access to full-text documents when available
  • ๐Ÿ“Š Citation tracking โ€” citation counts help identify influential papers

Data extraction fields

FieldDescription
titlePaper title
urlLink to the paper
authorsAuthor names
yearPublication year
sourceJournal, conference, or publisher
citationCountNumber of citations
snippetAbstract or snippet from Google Scholar
pdfUrlDirect link to PDF/full-text (when available)
typeResult type: PDF, BOOK, HTML, or CITATION
citedByUrlGoogle Scholar link to papers that cite this one
relatedUrlGoogle Scholar link to related articles
versionCountNumber of versions available
clusterIdGoogle Scholar cluster ID
querySearch query used
scrapedAtISO 8601 timestamp of when the data was scraped

How much does it cost to scrape Google Scholar?

This actor uses pay-per-event pricing:

EventPrice
Run started$0.01
Paper scraped$0.003 per paper

Example costs:

  • 50 papers โ‰ˆ $0.16
  • 100 papers โ‰ˆ $0.31
  • 1,000 papers โ‰ˆ $3.01

How to scrape Google Scholar papers

  1. Go to Google Scholar Scraper on Apify Store.
  2. Enter one or more search queries in the queries field (e.g., machine learning, author:"Yann LeCun").
  3. Optionally set year range with yearFrom and yearTo to filter by publication date.
  4. Choose sort order: relevance (default) or date (newest first).
  5. Set maxResults to control how many papers to extract per query.
  6. Click Start and wait for the run to complete.
  7. Download your data as JSON, CSV, or Excel from the Dataset tab.

Input parameters

ParameterTypeDescriptionDefault
queriesstring[]Search queries (supports Scholar syntax)โ€”
urlsstring[]Direct Google Scholar search URLsโ€”
yearFromintegerOnly papers published this year or laterโ€”
yearTointegerOnly papers published this year or earlierโ€”
sortBystringSort: relevance or daterelevance
includePatentsbooleanInclude patent resultstrue
includeCitationsbooleanInclude citation entriestrue
maxResultsintegerMax papers per query (1โ€“1000)100
maxRequestRetriesintegerRetry attempts for failed requests3

Input examples

Basic search:

{
"queries": ["machine learning"],
"maxResults": 50
}

Filtered search with year range:

{
"queries": ["transformer neural network"],
"yearFrom": 2020,
"yearTo": 2025,
"sortBy": "date",
"maxResults": 100
}

Author search:

{
"queries": ["author:\"Yann LeCun\" deep learning"],
"maxResults": 30
}

Multiple queries:

{
"queries": [
"\"large language models\" safety",
"reinforcement learning robotics",
"graph neural networks"
],
"yearFrom": 2023,
"maxResults": 50
}

Direct Google Scholar URL:

{
"urls": ["https://scholar.google.com/scholar?q=CRISPR+gene+editing&as_ylo=2022"],
"maxResults": 100
}

Output example

Each paper is returned as a JSON object:

{
"title": "Attention Is All You Need",
"url": "https://arxiv.org/abs/1706.03762",
"authors": "A Vaswani, N Shazeer, N Parmar...",
"year": 2017,
"source": "Advances in neural information processing systems",
"citationCount": 125000,
"snippet": "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks...",
"pdfUrl": "https://arxiv.org/pdf/1706.03762",
"type": "PDF",
"citedByUrl": "https://scholar.google.com/scholar?cites=...",
"relatedUrl": "https://scholar.google.com/scholar?q=related:...",
"versionCount": 15,
"clusterId": "1234567890",
"query": "transformer neural network",
"scrapedAt": "2026-03-22T10:00:00.000Z"
}

Google Scholar search syntax

SyntaxExampleDescription
"exact phrase""deep learning"Exact phrase match
author:"Name"author:"Yann LeCun"Search by author
-termmachine learning -surveyExclude a term
OR"CNN" OR "convolutional"Match either term
intitle:intitle:transformerTerm must appear in title
source:source:NatureRestrict to a journal

Tips

  • ๐ŸŽฏ Use exact phrases for precise results: "attention is all you need" finds that specific paper
  • ๐Ÿ‘ค Author search works well: author:"Geoffrey Hinton" finds papers by that author
  • ๐Ÿ“… Year filtering is useful for finding recent work in fast-moving fields
  • ๐Ÿ”„ Sort by date to find the newest papers on a topic
  • ๐Ÿ“Š Citation count is a good proxy for paper importance and influence
  • ๐Ÿ”— Direct URLs let you use Google Scholar's Advanced Search UI to build complex queries, then paste the URL here
  • โฑ๏ธ Rate limits: Google Scholar may rate-limit heavy usage. The scraper includes automatic delays between pages. For large-scale scraping, use Apify proxy.

Integrations

Connect Google Scholar Scraper with any cloud service or web app using integrations on the Apify platform. You can connect with Make, Zapier, Slack, Airbyte, GitHub, Google Sheets, Google Drive, and more. You can also use webhooks to trigger actions when a run finishes or fails.

How to use Google Scholar Scraper with the API

Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });
const run = await client.actor('automation-lab/google-scholar-scraper').call({
queries: ['machine learning'],
yearFrom: 2023,
maxResults: 50,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach(paper => {
console.log(`${paper.title} (${paper.year}) โ€” Cited by ${paper.citationCount}`);
});

Python

from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("automation-lab/google-scholar-scraper").call(run_input={
"queries": ["machine learning"],
"yearFrom": 2023,
"maxResults": 50,
})
for paper in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"{paper['title']} ({paper['year']}) โ€” Cited by {paper['citationCount']}")

cURL

curl "https://api.apify.com/v2/acts/automation-lab~google-scholar-scraper/runs" \
-X POST \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{"queries": ["machine learning"], "yearFrom": 2023, "maxResults": 50}'

Use with AI agents via MCP

Google Scholar Scraper is available as a tool for AI assistants via the Model Context Protocol (MCP).

Setup for Claude Code

$claude mcp add --transport http apify "https://mcp.apify.com"

Setup for Claude Desktop, Cursor, or VS Code

{
"mcpServers": {
"apify": {
"url": "https://mcp.apify.com"
}
}
}

Example prompts

  • "Search Google Scholar for papers about large language models from 2024"
  • "Find the most cited papers on CRISPR gene editing"
  • "Get papers by author Geoffrey Hinton on deep learning"

Learn more in the Apify MCP documentation.

Google Scholar displays publicly available academic metadata โ€” titles, authors, abstracts, and citation counts. This scraper extracts only publicly visible information that Google Scholar itself aggregates from academic publishers. It does not bypass authentication, access paywalled content, or download full papers.

You should review Google's Terms of Service and ensure your use case complies with applicable laws. Use reasonable request rates and respect robots.txt guidelines.

Legality

Scraping publicly available data is generally legal according to the US Court of Appeals ruling (HiQ Labs v. LinkedIn). This actor only accesses publicly available information and does not require authentication. Always review and comply with the target website's Terms of Service before scraping. For personal data, ensure compliance with GDPR, CCPA, and other applicable privacy regulations.

FAQ

Q: How many papers can I scrape at once? A: Up to 1,000 papers per query. Google Scholar paginates at 10 results per page, so 1,000 papers requires fetching 100 pages.

Q: Does it return full paper text? A: No. The scraper returns metadata (title, authors, abstract snippet, citation count) and PDF links when available. Use the pdfUrl field to access full papers.

Q: Can I search by author only? A: Yes. Use author:"Name" syntax in your query, e.g., author:"Yann LeCun".

Q: Why am I getting fewer results than expected? A: Google Scholar may return fewer results for very specific queries. Also, rate limiting may kick in for large scrapes. Try broader search terms or reduce maxResults.

Q: Can I filter by journal or conference? A: Yes. Use Google Scholar's source: operator in your query, e.g., source:Nature deep learning.