Google Scholar Scraper — Papers, Citations & Author Profiles
Pricing
from $5.00 / 1,000 result returneds
Google Scholar Scraper — Papers, Citations & Author Profiles
Scrape Google Scholar across 6 modes: paper search, citation export (BibTeX/APA/MLA/Chicago), author profiles (h-index, i10-index), publication lists, citation history, and co-author networks. MCP-ready. Hybrid Camoufox + SerpApi managed/BYOK fallback for high reliability.
Pricing
from $5.00 / 1,000 result returneds
Rating
0.0
(0)
Developer
Khadin Akbar
Maintained by CommunityActor stats
0
Bookmarked
2
Total users
1
Monthly active users
3 days ago
Last modified
Categories
Share
Scrape Google Scholar at scale across six modes in one Actor — paper search, citation-format export, author profiles, publication lists, citation history, and co-author networks. Built MCP-ready for AI agents and powered by a hybrid Camoufox + SerpApi fallback engine so your runs keep returning data even when Google Scholar throws a CAPTCHA. Try it with the default input (a search for large language models) and download results as JSON, CSV, Excel, or HTML.
Runs on the Apify platform — API access, scheduling, integrations, residential proxy rotation, and run monitoring all included.
What does Google Scholar Scraper do?
Google Scholar has no official API and blocks scrapers aggressively. This Actor solves both problems. It extracts structured bibliographic data — titles, authors, publication venues, years, citation counts, PDF links, h-index, i10-index, citation histories, and co-author graphs — and returns one clean, flat JSON record per result. Pick a mode for the job:
| Mode | What you get |
|---|---|
search | Papers matching a keyword query, with filters for year range, document type, patents, case law, and review-only |
cite | Citation strings in APA, MLA, Chicago, Harvard, Vancouver plus BibTeX, EndNote, RefMan, RefWorks export links |
author_profile | One author's metrics: affiliation, interests, h-index, i10-index, total & recent citations |
author_articles | An author's full publication list, paginated and sortable |
author_citation | An author's year-by-year citation history (great for tracking growth) |
author_co_authors | An author's co-author network for mapping research communities |
Why use this Google Scholar Scraper?
- Reliability first. Google Scholar's CAPTCHA wall breaks most scrapers (competitors sit at 84–98% success). This Actor tries a stealth Camoufox browser first, then transparently falls back to a managed SerpApi path — so you get data, not empty runs.
- Bring your own key, pay less. Supply your own SerpApi key (BYOK) and the reliable path runs at the standard per-result price with no managed-fallback premium.
- Six tools in one. No need to wire up four separate scrapers for search, citations, authors, and co-authors.
- Agent-native. Narrow inputs, flat structured JSON output, and clear cost signals make it a clean tool call for Claude, ChatGPT, or any MCP client.
- Built for literature reviews, bibliometrics, and RAG. Feed structured scholarly data straight into knowledge graphs, vector stores, or analytics notebooks.
How to use Google Scholar Scraper
- Open the Input tab.
- Choose a Mode (defaults to paper search).
- Fill the field that mode needs —
queriesfor search,resultIdsfor cite,authorIdsfor the author modes. - (Optional) Set
maxResults, a year range, or your ownserpApiKey. - Click Start and watch results stream into the dataset.
- Download as JSON, CSV, Excel, or HTML, or pull via the Apify API.
Input
The only field you usually set is Mode plus its matching field. Example — search for recent transformer papers:
{"mode": "search","queries": ["transformer architecture"],"yearFrom": 2020,"maxResults": 50}
Look up an author's profile:
{ "mode": "author_profile", "authorIds": ["LSsXyncAAAAJ"] }
Export citation formats for a paper (needs a SerpApi key):
{ "mode": "cite", "resultIds": ["TY8gM2sAAAAJ"], "serpApiKey": "your-key" }
Output
Each result is one flat JSON record. A search paper looks like:
{"mode": "search","query": "transformer architecture","position": 1,"title": "Attention is all you need","resultId": "u-CT435A0vkJ","link": "https://proceedings.neurips.cc/paper/2017/...","snippet": "The dominant sequence transduction models...","authors": [{ "name": "A Vaswani", "authorId": "...", "profileUrl": "..." }],"publicationInfo": "A Vaswani, N Shazeer… - Advances in neural…, 2017","year": 2017,"citedByCount": 145203,"citedByLink": "https://scholar.google.com/scholar?cites=...","versionsCount": 53,"pdfUrl": "https://proceedings.neurips.cc/...pdf","pdfFormat": "PDF","source": "serpapi","scrapedAt": "2026-05-30T12:00:00.000Z"}
You can download the dataset in various formats such as JSON, HTML, CSV, or Excel.
Data fields
| Field | Description |
|---|---|
mode | The operation that produced the record |
title | Paper title |
authors | Array of { name, authorId, profileUrl } |
year | Publication year |
citedByCount | Number of citing papers |
publicationInfo | Venue / journal / publisher summary |
link, pdfUrl | Article URL and direct PDF link |
resultId | Paper/cluster ID — feed into cite mode |
name, authorId, hIndex, i10Index, citationsTotal | Author-mode fields |
citationsByYear | Year-by-year citation history (author_citation) |
coAuthorName, coAuthorId | Co-author edges (author_co_authors) |
citations, exportLinks | Citation strings + export links (cite) |
source | camoufox (direct) or serpapi (fallback) |
scrapedAt | ISO 8601 timestamp |
How much does it cost to scrape Google Scholar?
This Actor uses pay-per-event pricing plus optional usage-based billing:
- $0.00005 per actor start
- $0.005 per result (paper, citation export, citation-history record, or co-author) — capped by your
maxResults - $0.01 per author profile (
author_profilemode only)
A 50-paper search costs about $0.25. The free Apify tier covers small jobs. Bring your own SerpApi key to avoid any managed-fallback overhead on large runs.
Tips and advanced options
- For big or time-sensitive jobs, set
forceSerpApi: truewith aserpApiKeyto skip the direct-scrape attempt and go straight to the reliable path. - Chain modes: run
search, grabresultIdandauthors[].authorIdfrom the output, then feed those intociteandauthor_profile. - Filter tightly with
yearFrom/yearTo,reviewArticlesOnly, andsortByDateto keep result counts (and cost) down. - Residential proxies are the default and strongly recommended — Google Scholar blocks datacenter IPs instantly.
FAQ, disclaimer, and support
Is scraping Google Scholar legal? This Actor collects only publicly available bibliographic metadata for research, bibliometric, and indexing use. You are responsible for complying with Google's Terms of Service and applicable laws in your jurisdiction. Do not use it to violate copyright or republish protected content.
Why do some runs use the SerpApi source? Google Scholar serves CAPTCHAs to datacenter and residential traffic alike. When the direct Camoufox scrape is blocked, the Actor falls back to a managed SerpApi path so your run still returns data. Provide your own key for the cheapest reliable path.
Found a bug or need a field added? Open an issue on the Actor's Issues tab. Custom scraping solutions are available on request.
Related actors
- Google Patents Scraper — patents, citations & assignee/inventor portfolios for IP context alongside Scholar.
- Goodreads Scraper — books, reviews & authors for non-academic bibliographic and biography work.
- Google SERP Scraper — Google search results when you need broader web coverage beyond Scholar.
- Google Trends Scraper — interest trends for research topics you find in Scholar.
- SEC EDGAR Scraper — 10-K, 8-K & 13F filings to anchor academic findings against issuer disclosures.