Google Scholar Scraper
Pricing
from $2.00 / 1,000 results
Google Scholar Scraper
[π° $2.0 / 1K] Extract academic papers, author profiles, h-index, i10-index, citation counts, abstracts, and PDF links from Google Scholar. Batch search queries and author IDs, filter by year range, sort by relevance or date.
Pricing
from $2.00 / 1,000 results
Rating
0.0
(0)
Developer
SolidCode
Actor stats
0
Bookmarked
3
Total users
2
Monthly active users
18 hours ago
Last modified
Categories
Share
Pull academic papers, author profiles, and citation data from Google Scholar at scale β complete with h-index, i10-index, citation counts, BibTeX entries, and formatted MLA/APA/Chicago/Harvard/Vancouver citations. Built for researchers, analysts, and content teams who need a clean, structured academic dataset without wrestling with Scholar's HTML one page at a time.
Why This Scraper?
- Papers, authors, and citations in one actor β search Scholar by keywords, pull complete author profiles by ID, and follow the "Cited by" graph. One run, one dataset.
- Batch everything β many queries and many author IDs in a single invocation. Pay once for the setup, get all your results in one place.
- Up to 1,000 papers per query β hits Google Scholar's own upper bound with smooth pagination and no duplicates.
- Year range and date-sorted results β narrow to a publication window or sort by most-recent-first to surface the latest literature.
- BibTeX and formatted citations on demand β enrich every paper with a ready-to-paste BibTeX entry and MLA, APA, Chicago, Harvard, and Vancouver citation strings.
- 20+ languages and 40+ countries β localize results with language and country controls for regional coverage.
- No API key, no sign-up β Google Scholar has no public API. This is the fastest path from a keyword to a clean academic dataset.
Use Cases
Academic Research & Literature Reviews
- Build a ranked reading list for a new research topic in minutes
- Track the citation graph of a seminal paper to find follow-up work
- Discover adjacent researchers via the "Cited by" chain
Competitive & Industry Intelligence
- Monitor what research labs or university groups are publishing on a topic
- Benchmark academic output of competing institutions by author ID
- Detect emerging sub-fields from a burst of recent publications
Grant Writing & Funding Prep
- Assemble a bibliography of prior work to justify a new grant proposal
- Quantify a lab's impact with total citations, h-index, and i10-index
- Identify gaps in the literature to frame a novel research question
Bibliometrics & Research Analytics
- Build citation-count time series for meta-analysis or scientometrics
- Analyze author productivity trends across years
- Map co-author networks from author profile data
SEO & Content Research
- Back marketing claims with peer-reviewed sources
- Find credible experts to quote or interview for long-form content
- Surface studies that competitors cite to match their evidence depth
Education & Curriculum Design
- Compile course reading lists from the most-cited papers in a field
- Discover open-access PDF versions of academic texts
- Track which textbook chapters or papers are cited in recent syllabi
Getting Started
Basic Keyword Search
The simplest possible run β one topic, 50 papers:
{"searchQueries": ["quantum error correction"],"maxResults": 50}
Filtered Search (Year + Language + Country)
Narrow to recent papers and localize for a European audience:
{"searchQueries": ["large language models healthcare"],"yearFrom": 2023,"yearTo": 2025,"sortBy": "date","language": "de","country": "de","maxResults": 100}
Author Profile Lookup
Pull a complete profile by Scholar author ID β metrics, research interests, co-authors, and the full publication list. Paste either the ID or the full URL:
{"authorIds": ["JicYPdAAAAAJ","https://scholar.google.com/citations?user=5KJrNtoAAAAJ&hl=en"],"maxResults": 200}
To find an author ID, open any Google Scholar author page and copy the value after user= in the URL.
Combined Search + Citation Graph
Fetch papers for a query, then follow each paper's "Cited by" link:
{"searchQueries": ["attention is all you need"],"authorIds": ["JicYPdAAAAAJ"],"includeCitations": true,"maxCitationsPerPaper": 50,"includeAbstracts": true,"maxResults": 10}
Bibliography Export (BibTeX + Formatted Citations)
Pull a topic's top papers with ready-to-paste BibTeX entries and pre-formatted citations:
{"searchQueries": ["bert language model"],"includeBibtex": true,"maxResults": 20}
Input Reference
What to Scrape
| Parameter | Type | Default | Description |
|---|---|---|---|
searchQueries | string[] | ["machine learning healthcare"] | Keywords to search on Google Scholar. Each query produces its own set of paper results. |
authorIds | string[] | [] | Google Scholar author IDs or full profile URLs. Paste either the 10-14 character ID (JicYPdAAAAAJ) or the URL. |
Results
| Parameter | Type | Default | Description |
|---|---|---|---|
maxResults | integer | 50 | Maximum papers per search query. Google Scholar caps at roughly 1,000 results. Set to 0 for everything available. |
Filters
| Parameter | Type | Default | Description |
|---|---|---|---|
yearFrom | integer | null | Only include papers published in this year or later. |
yearTo | integer | null | Only include papers published in this year or earlier. |
sortBy | string | "relevance" | "relevance" keeps Scholar's default ranking. "date" sorts most recent first. |
Localization
| Parameter | Type | Default | Description |
|---|---|---|---|
language | string | "en" | Scholar interface and snippet language. 20 options including English, Spanish, German, French, Japanese, Chinese, Arabic, and more. |
country | string | "us" | Country code for regional localization. 45 options across the Americas, Europe, Asia-Pacific, and MENA. |
Enrichment
| Parameter | Type | Default | Description |
|---|---|---|---|
includeAbstracts | boolean | true | Include the snippet/abstract text for each paper. |
includeCitations | boolean | false | For each paper, follow the "Cited by" link and return citing papers. Significantly increases runtime and cost. |
maxCitationsPerPaper | integer | 20 | Cap on citing papers per source paper when includeCitations is on. Up to 200. |
includeBibtex | boolean | false | Enrich every paper row with a BibTeX entry and MLA/APA/Chicago/Harvard/Vancouver citation strings. Adds two extra Scholar requests per paper. |
Output
Every row carries a recordType field β paper, authorProfile, or citingPaper β so you can filter cleanly downstream.
Paper (recordType: "paper")
{"recordType": "paper","query": "attention is all you need","rank": 1,"title": "Attention is all you need","url": "https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html","authors": "A Vaswani, N Shazeer, N Parmar, J Uszkoreit","authorList": ["A Vaswani", "N Shazeer", "N Parmar", "J Uszkoreit"],"year": 2017,"venue": "Advances in neural information processing systems","abstract": "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks...","citationCount": 142301,"citedByUrl": "https://scholar.google.com/scholar?cites=2960712678066186980","versionsCount": 73,"pdfUrl": "https://arxiv.org/pdf/1706.03762.pdf","pdfSource": "arxiv.org"}
| Field | Type | Description |
|---|---|---|
recordType | string | Always "paper" |
query | string | The search query that produced this row |
rank | number | Position in the query's result set |
title | string | Paper title |
url | string | Canonical paper URL (journal, arXiv, etc.) |
authors | string | Comma-separated author line |
authorList | string[] | Authors split into an array |
year | number | Publication year |
venue | string | Journal, conference, or publisher |
abstract | string | Snippet / abstract text |
citationCount | number | Number of papers citing this one |
citedByUrl | string | Scholar URL to the full citing-paper list |
versionsCount | number | Number of versions Scholar found |
pdfUrl | string | Direct PDF link when Scholar lists one |
pdfSource | string | Host domain of the PDF link |
bibtex | string | Raw BibTeX entry β only when includeBibtex: true |
formattedCitations | object | {mla, apa, chicago, harvard, vancouver} strings β only when includeBibtex: true |
Author Profile (recordType: "authorProfile")
{"recordType": "authorProfile","authorId": "JicYPdAAAAAJ","name": "Geoffrey Hinton","affiliation": "Emeritus Prof. Computer Science, University of Toronto","homepageUrl": "http://www.cs.toronto.edu/~hinton","interests": ["machine learning", "artificial intelligence", "cognitive science"],"totalCitations": 1029825,"hIndex": 190,"i10Index": 526,"citationHistogram": [{"year": 2023, "count": 112043}],"coAuthors": [{"authorId": "m1qAiOUAAAAJ", "name": "Yann LeCun"}],"publications": [{"title": "Deep learning","authors": "Y LeCun, Y Bengio, G Hinton","venue": "Nature 521 (7553), 436-444","year": 2015,"citationCount": 82310}]}
| Field | Type | Description |
|---|---|---|
recordType | string | Always "authorProfile" |
authorId | string | Scholar author ID |
name | string | Author's display name |
affiliation | string | Affiliation text as shown on the profile |
verifiedEmailDomain | string | Verified email domain (when opted in) |
homepageUrl | string | Personal or institutional homepage |
interests | string[] | Research interest tags |
totalCitations | number | All-time total citations |
hIndex | number | All-time h-index |
i10Index | number | All-time i10-index |
citationHistogram | object[] | Annual citation counts [{year, count}, ...] |
coAuthors | object[] | Linked co-authors with their own IDs |
publications | object[] | Full publication list with titles, venues, years, and citation counts |
The author profile also includes *Since variants (totalCitationsSince, hIndexSince, i10IndexSince) scoped to Scholar's recent window, plus profileImageUrl.
Citing Paper (recordType: "citingPaper")
Emitted only when includeCitations: true. Capped at maxCitationsPerPaper per source paper.
{"recordType": "citingPaper","parentPaperTitle": "Attention is all you need","title": "BERT: Pre-training of deep bidirectional transformers for language understanding","url": "https://arxiv.org/abs/1810.04805","authors": "J Devlin, MW Chang, K Lee, K Toutanova","year": 2018,"venue": "arXiv preprint arXiv:1810.04805","citationCount": 98421,"pdfUrl": "https://arxiv.org/pdf/1810.04805"}
Same shape as a paper row, plus parentPaperTitle, parentClusterId, and parentQuery so you can join every citer back to the source paper it references.
Tips for Best Results
- Narrow the query. Scholar returns the best 1,000 hits for any query β broad terms like "machine learning" will drown out the gems. Add a modifier (
"machine learning healthcare 2024") to get a tighter, more useful set. - Use the year filter to cut noise. A
yearFrom: 2023filter strips away decades of older work and dramatically improves signal for recent literature reviews. - Pick the right sort order.
sortBy: "date"surfaces the most recent work;sortBy: "relevance"keeps Scholar's citation-weighted ranking for foundational reading. - Combine
authorIdsandsearchQueriesin one run. Pay for one start and get both a topic survey and the specific author profiles you care about. - Prefer smaller
maxResultsfor faster runs. If you need 50 papers, ask for 50 β not 1,000. Fewer pages means a quicker, cheaper run. - Turn off abstracts when you don't need them. Setting
includeAbstracts: falseshrinks every row and speeds up large runs. - Use citations sparingly.
includeCitations: truemultiplies row count by up to 20Γ per paper. KeepmaxResultsmodest (5β20) when you switch it on. - Author profiles return at least 20 publications per request. Scholar's profile pagination has a 20-publication minimum, so a
maxResults: 5run onauthorIdsmay still yield 20 publications in thepublicationsarray.
Pricing
$4.00 per 1,000 results β matches the market rate for Scholar extraction while bundling author metrics and citation graphs at no extra charge.
| Results | Estimated Cost |
|---|---|
| 100 | $0.40 |
| 1,000 | $4.00 |
| 10,000 | $40.00 |
| 100,000 | $400.00 |
A "result" is any row in the output dataset β a paper, an author profile, or a citing paper. Platform fees (compute, storage) are additional and depend on your Apify plan.
Integrations
Export data in JSON, CSV, Excel, XML, or RSS. Connect to 1,500+ apps via:
- Zapier / Make / n8n β Workflow automation
- Google Sheets β Direct spreadsheet export
- Slack / Email β Notifications on new results
- Webhooks β Trigger custom APIs on run completion
- Apify API β Full programmatic access
Legal & Ethical Use
This actor is designed for legitimate academic research, bibliometrics, literature review, and market intelligence. Users are responsible for complying with applicable laws and Google Scholar's terms of service, including making reasonable-rate requests and respecting content usage rules for any papers linked from Scholar. Do not use extracted data for spam, harassment, or any illegal purpose.