Google Scholar Scraper
Pricing
Pay per event
Google Scholar Scraper
Search Google Scholar and extract academic papers โ titles, authors, citations, abstracts, year, PDF links, and publication details. Pure HTTP, fast and cheap.
Pricing
Pay per event
Rating
0.0
(0)
Developer
Stas Persiianenko
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
7 hours ago
Last modified
Categories
Share
Search Google Scholar and extract academic papers with full metadata. Returns titles, authors, citation counts, abstracts, PDF links, publication sources, and more.
Pure HTTP scraper โ no browser needed. Fast, cheap, and reliable.
What does Google Scholar Scraper do?
Google Scholar Scraper searches Google Scholar and extracts structured data from academic search results. For each paper, it returns the title, authors, publication year, citation count, abstract snippet, PDF link, and publication source.
- ๐ Search by query โ supports Google Scholar syntax: exact phrases, author search, exclusions, OR/AND operators
- ๐ Year filtering โ restrict results to a specific date range
- ๐ Sort by relevance or date โ newest first or most relevant
- ๐ PDF link extraction โ direct links to PDF/full-text when available
- ๐ Citation data โ citation counts and links to citing papers
- ๐ Direct URL support โ paste any Google Scholar search URL to scrape it
- ๐ Pagination โ automatically follows pages up to your maxResults limit
Who is Google Scholar Scraper for?
- ๐ Academic researchers building literature reviews and meta-analyses
- ๐งฌ R&D teams tracking publications in their domain
- ๐ Bibliometricians analyzing citation networks and research trends
- ๐ค AI/ML engineers curating training datasets from scientific papers
- ๐ฐ Science journalists finding expert sources and trending research
- ๐ข Patent analysts monitoring prior art and competitor research
- ๐ Librarians building curated reading lists and research guides
Why use Google Scholar Scraper?
- โก No browser overhead โ pure HTTP requests mean fast execution and low cost
- ๐ Automated pagination โ scrapes hundreds of results across multiple pages
- ๐ Structured output โ clean JSON with all metadata fields ready for analysis
- ๐ฐ Pay per result โ only pay for the papers you actually extract
- ๐ PDF links included โ direct access to full-text documents when available
- ๐ Citation tracking โ citation counts help identify influential papers
Data extraction fields
| Field | Description |
|---|---|
title | Paper title |
url | Link to the paper |
authors | Author names |
year | Publication year |
source | Journal, conference, or publisher |
citationCount | Number of citations |
snippet | Abstract or snippet from Google Scholar |
pdfUrl | Direct link to PDF/full-text (when available) |
type | Result type: PDF, BOOK, HTML, or CITATION |
citedByUrl | Google Scholar link to papers that cite this one |
relatedUrl | Google Scholar link to related articles |
versionCount | Number of versions available |
clusterId | Google Scholar cluster ID |
query | Search query used |
scrapedAt | ISO 8601 timestamp of when the data was scraped |
How much does it cost to scrape Google Scholar?
This actor uses pay-per-event pricing:
| Event | Price |
|---|---|
| Run started | $0.01 |
| Paper scraped | $0.003 per paper |
Example costs:
- 50 papers โ $0.16
- 100 papers โ $0.31
- 1,000 papers โ $3.01
How to scrape Google Scholar papers
- Go to Google Scholar Scraper on Apify Store.
- Enter one or more search queries in the
queriesfield (e.g.,machine learning,author:"Yann LeCun"). - Optionally set year range with
yearFromandyearToto filter by publication date. - Choose sort order:
relevance(default) ordate(newest first). - Set
maxResultsto control how many papers to extract per query. - Click Start and wait for the run to complete.
- Download your data as JSON, CSV, or Excel from the Dataset tab.
Input parameters
| Parameter | Type | Description | Default |
|---|---|---|---|
queries | string[] | Search queries (supports Scholar syntax) | โ |
urls | string[] | Direct Google Scholar search URLs | โ |
yearFrom | integer | Only papers published this year or later | โ |
yearTo | integer | Only papers published this year or earlier | โ |
sortBy | string | Sort: relevance or date | relevance |
includePatents | boolean | Include patent results | true |
includeCitations | boolean | Include citation entries | true |
maxResults | integer | Max papers per query (1โ1000) | 100 |
maxRequestRetries | integer | Retry attempts for failed requests | 3 |
Input examples
Basic search:
{"queries": ["machine learning"],"maxResults": 50}
Filtered search with year range:
{"queries": ["transformer neural network"],"yearFrom": 2020,"yearTo": 2025,"sortBy": "date","maxResults": 100}
Author search:
{"queries": ["author:\"Yann LeCun\" deep learning"],"maxResults": 30}
Multiple queries:
{"queries": ["\"large language models\" safety","reinforcement learning robotics","graph neural networks"],"yearFrom": 2023,"maxResults": 50}
Direct Google Scholar URL:
{"urls": ["https://scholar.google.com/scholar?q=CRISPR+gene+editing&as_ylo=2022"],"maxResults": 100}
Output example
Each paper is returned as a JSON object:
{"title": "Attention Is All You Need","url": "https://arxiv.org/abs/1706.03762","authors": "A Vaswani, N Shazeer, N Parmar...","year": 2017,"source": "Advances in neural information processing systems","citationCount": 125000,"snippet": "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks...","pdfUrl": "https://arxiv.org/pdf/1706.03762","type": "PDF","citedByUrl": "https://scholar.google.com/scholar?cites=...","relatedUrl": "https://scholar.google.com/scholar?q=related:...","versionCount": 15,"clusterId": "1234567890","query": "transformer neural network","scrapedAt": "2026-03-22T10:00:00.000Z"}
Google Scholar search syntax
| Syntax | Example | Description |
|---|---|---|
"exact phrase" | "deep learning" | Exact phrase match |
author:"Name" | author:"Yann LeCun" | Search by author |
-term | machine learning -survey | Exclude a term |
OR | "CNN" OR "convolutional" | Match either term |
intitle: | intitle:transformer | Term must appear in title |
source: | source:Nature | Restrict to a journal |
Tips
- ๐ฏ Use exact phrases for precise results:
"attention is all you need"finds that specific paper - ๐ค Author search works well:
author:"Geoffrey Hinton"finds papers by that author - ๐ Year filtering is useful for finding recent work in fast-moving fields
- ๐ Sort by date to find the newest papers on a topic
- ๐ Citation count is a good proxy for paper importance and influence
- ๐ Direct URLs let you use Google Scholar's Advanced Search UI to build complex queries, then paste the URL here
- โฑ๏ธ Rate limits: Google Scholar may rate-limit heavy usage. The scraper includes automatic delays between pages. For large-scale scraping, use Apify proxy.
Integrations
Connect Google Scholar Scraper with any cloud service or web app using integrations on the Apify platform. You can connect with Make, Zapier, Slack, Airbyte, GitHub, Google Sheets, Google Drive, and more. You can also use webhooks to trigger actions when a run finishes or fails.
How to use Google Scholar Scraper with the API
Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_API_TOKEN' });const run = await client.actor('automation-lab/google-scholar-scraper').call({queries: ['machine learning'],yearFrom: 2023,maxResults: 50,});const { items } = await client.dataset(run.defaultDatasetId).listItems();items.forEach(paper => {console.log(`${paper.title} (${paper.year}) โ Cited by ${paper.citationCount}`);});
Python
from apify_client import ApifyClientclient = ApifyClient("YOUR_API_TOKEN")run = client.actor("automation-lab/google-scholar-scraper").call(run_input={"queries": ["machine learning"],"yearFrom": 2023,"maxResults": 50,})for paper in client.dataset(run["defaultDatasetId"]).iterate_items():print(f"{paper['title']} ({paper['year']}) โ Cited by {paper['citationCount']}")
cURL
curl "https://api.apify.com/v2/acts/automation-lab~google-scholar-scraper/runs" \-X POST \-H "Authorization: Bearer YOUR_API_TOKEN" \-H "Content-Type: application/json" \-d '{"queries": ["machine learning"], "yearFrom": 2023, "maxResults": 50}'
Use with AI agents via MCP
Google Scholar Scraper is available as a tool for AI assistants via the Model Context Protocol (MCP).
Setup for Claude Code
$claude mcp add --transport http apify "https://mcp.apify.com"
Setup for Claude Desktop, Cursor, or VS Code
{"mcpServers": {"apify": {"url": "https://mcp.apify.com"}}}
Example prompts
- "Search Google Scholar for papers about large language models from 2024"
- "Find the most cited papers on CRISPR gene editing"
- "Get papers by author Geoffrey Hinton on deep learning"
Learn more in the Apify MCP documentation.
Is it legal to scrape Google Scholar?
Google Scholar displays publicly available academic metadata โ titles, authors, abstracts, and citation counts. This scraper extracts only publicly visible information that Google Scholar itself aggregates from academic publishers. It does not bypass authentication, access paywalled content, or download full papers.
You should review Google's Terms of Service and ensure your use case complies with applicable laws. Use reasonable request rates and respect robots.txt guidelines.
Legality
Scraping publicly available data is generally legal according to the US Court of Appeals ruling (HiQ Labs v. LinkedIn). This actor only accesses publicly available information and does not require authentication. Always review and comply with the target website's Terms of Service before scraping. For personal data, ensure compliance with GDPR, CCPA, and other applicable privacy regulations.
FAQ
Q: How many papers can I scrape at once? A: Up to 1,000 papers per query. Google Scholar paginates at 10 results per page, so 1,000 papers requires fetching 100 pages.
Q: Does it return full paper text?
A: No. The scraper returns metadata (title, authors, abstract snippet, citation count) and PDF links when available. Use the pdfUrl field to access full papers.
Q: Can I search by author only?
A: Yes. Use author:"Name" syntax in your query, e.g., author:"Yann LeCun".
Q: Why am I getting fewer results than expected?
A: Google Scholar may return fewer results for very specific queries. Also, rate limiting may kick in for large scrapes. Try broader search terms or reduce maxResults.
Q: Can I filter by journal or conference?
A: Yes. Use Google Scholar's source: operator in your query, e.g., source:Nature deep learning.
Related scrapers
- ArXiv Scraper โ scrape preprint papers from arXiv
- Crossref Scraper โ extract scholarly article metadata via CrossRef
- OpenAlex Scraper โ extract academic paper metadata, citations, and author data
- ClinicalTrials Scraper โ extract clinical trial data from ClinicalTrials.gov
- Google Search Scraper โ scrape Google Search results