Pricing

from $2.00 / 1,000 results

Google Scholar Scraper

[💰 $2.0 / 1K] Extract academic papers, author profiles, h-index, i10-index, citation counts, abstracts, and PDF links from Google Scholar. Batch search queries and author IDs, filter by year range, sort by relevance or date.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

SolidCode

Actor stats

Bookmarked

Total users

Monthly active users

0.73 hours

Issues response

2 months ago

Last modified

Why This Scraper?

Papers, authors, and citations in one actor — search Scholar by keywords, pull complete author profiles by ID, and follow the "Cited by" graph. One run, one dataset.
Batch everything — many queries and many author IDs in a single invocation. Pay once for the setup, get all your results in one place.
Up to 1,000 papers per query — hits Google Scholar's own upper bound with smooth pagination and no duplicates.
Year range and date-sorted results — narrow to a publication window or sort by most-recent-first to surface the latest literature.
BibTeX and formatted citations on demand — enrich every paper with a ready-to-paste BibTeX entry and MLA, APA, Chicago, Harvard, and Vancouver citation strings.
20+ languages and 40+ countries — localize results with language and country controls for regional coverage.
No API key, no sign-up — Google Scholar has no public API. This is the fastest path from a keyword to a clean academic dataset.

Use Cases

Academic Research & Literature Reviews

Build a ranked reading list for a new research topic in minutes
Track the citation graph of a seminal paper to find follow-up work
Discover adjacent researchers via the "Cited by" chain

Competitive & Industry Intelligence

Monitor what research labs or university groups are publishing on a topic
Benchmark academic output of competing institutions by author ID
Detect emerging sub-fields from a burst of recent publications

Grant Writing & Funding Prep

Assemble a bibliography of prior work to justify a new grant proposal
Quantify a lab's impact with total citations, h-index, and i10-index
Identify gaps in the literature to frame a novel research question

Bibliometrics & Research Analytics

Build citation-count time series for meta-analysis or scientometrics
Analyze author productivity trends across years
Map co-author networks from author profile data

SEO & Content Research

Back marketing claims with peer-reviewed sources
Find credible experts to quote or interview for long-form content
Surface studies that competitors cite to match their evidence depth

Education & Curriculum Design

Compile course reading lists from the most-cited papers in a field
Discover open-access PDF versions of academic texts
Track which textbook chapters or papers are cited in recent syllabi

Getting Started

Basic Keyword Search

The simplest possible run — one topic, 50 papers:

{
    "searchQueries": ["quantum error correction"],
    "maxResults": 50
}

Filtered Search (Year + Language + Country)

Narrow to recent papers and localize for a European audience:

{
    "searchQueries": ["large language models healthcare"],
    "yearFrom": 2023,
    "yearTo": 2025,
    "sortBy": "date",
    "language": "de",
    "country": "de",
    "maxResults": 100
}

Author Profile Lookup

Pull a complete profile by Scholar author ID — metrics, research interests, co-authors, and the full publication list. You get one authorProfile summary row plus a separate paper row for every publication, so an author's work lands in the Papers view ready to filter and sort just like a keyword search. Year range, date sorting, abstracts, and citing-paper follow all apply to an author's publications. Paste either the ID or the full URL:

{
    "authorIds": [
        "JicYPdAAAAAJ",
        "https://scholar.google.com/citations?user=5KJrNtoAAAAJ&hl=en"
    ],
    "maxResults": 200
}

To find an author ID, open any Google Scholar author page and copy the value after user= in the URL.

Combined Search + Citation Graph

Fetch papers for a query, then follow each paper's "Cited by" link:

{
    "searchQueries": ["attention is all you need"],
    "authorIds": ["JicYPdAAAAAJ"],
    "includeCitations": true,
    "maxCitationsPerPaper": 50,
    "includeAbstracts": true,
    "maxResults": 10
}

Bibliography Export (BibTeX + Formatted Citations)

Pull a topic's top papers with ready-to-paste BibTeX entries and pre-formatted citations:

{
    "searchQueries": ["bert language model"],
    "includeBibtex": true,
    "maxResults": 20
}

Input Reference

What to Scrape

Parameter	Type	Default	Description
`searchQueries`	string[]	`["machine learning healthcare"]`	Keywords to search on Google Scholar. Each query produces its own set of paper results.
`authorIds`	string[]	`[]`	Google Scholar author IDs or full profile URLs. Paste either the 10-14 character ID (`JicYPdAAAAAJ`) or the URL.

Results

Parameter	Type	Default	Description
`maxResults`	integer	`50`	Maximum papers per search query. Google Scholar caps at roughly 1,000 results. Set to `0` for everything available.

Filters

Parameter	Type	Default	Description
`yearFrom`	integer	null	Only include papers published in this year or later.
`yearTo`	integer	null	Only include papers published in this year or earlier.
`sortBy`	string	`"relevance"`	`"relevance"` keeps Scholar's default ranking. `"date"` sorts most recent first.

Localization

Parameter	Type	Default	Description
`language`	string	`"en"`	Scholar interface and snippet language. 20 options including English, Spanish, German, French, Japanese, Chinese, Arabic, and more.
`country`	string	`"us"`	Country code for regional localization. 45 options across the Americas, Europe, Asia-Pacific, and MENA.

Enrichment

Parameter	Type	Default	Description
`includeAbstracts`	boolean	`true`	Include the snippet/abstract text for each paper.
`includeCitations`	boolean	`false`	For each paper, follow the "Cited by" link and return citing papers. Significantly increases runtime and cost.
`maxCitationsPerPaper`	integer	`20`	Cap on citing papers per source paper when `includeCitations` is on. Up to `200`.
`includeBibtex`	boolean	`false`	Enrich every paper row with a BibTeX entry and MLA/APA/Chicago/Harvard/Vancouver citation strings. Adds two extra Scholar requests per paper.

Output

Every row carries a recordType field — paper, authorProfile, or citingPaper — so you can filter cleanly downstream.

Paper (`recordType: "paper"`)

{
    "recordType": "paper",
    "query": "attention is all you need",
    "rank": 1,
    "title": "Attention is all you need",
    "url": "https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html",
    "authors": "A Vaswani, N Shazeer, N Parmar, J Uszkoreit",
    "authorList": ["A Vaswani", "N Shazeer", "N Parmar", "J Uszkoreit"],
    "year": 2017,
    "venue": "Advances in neural information processing systems",
    "abstract": "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks...",
    "citationCount": 142301,
    "citedByUrl": "https://scholar.google.com/scholar?cites=2960712678066186980",
    "versionsCount": 73,
    "pdfUrl": "https://arxiv.org/pdf/1706.03762.pdf",
    "pdfSource": "arxiv.org"
}

Field	Type	Description
`recordType`	string	Always `"paper"`
`query`	string	The search query that produced this row (`null` for author-sourced papers)
`authorId`	string	Set when the paper comes from an `authorIds` profile — the author it belongs to
`rank`	number	Position in the query's result set
`title`	string	Paper title
`url`	string	Canonical paper URL (journal, arXiv, etc.)
`authors`	string	Comma-separated author line
`authorList`	string[]	Authors split into an array
`year`	number	Publication year
`venue`	string	Journal, conference, or publisher
`abstract`	string	Snippet / abstract text
`citationCount`	number	Number of papers citing this one
`citedByUrl`	string	Scholar URL to the full citing-paper list
`versionsCount`	number	Number of versions Scholar found
`pdfUrl`	string	Direct PDF link when Scholar lists one
`pdfSource`	string	Host domain of the PDF link
`bibtex`	string	Raw BibTeX entry — only when `includeBibtex: true`
`formattedCitations`	object	`{mla, apa, chicago, harvard, vancouver}` strings — only when `includeBibtex: true`

Author Profile (`recordType: "authorProfile"`)

{
    "recordType": "authorProfile",
    "authorId": "JicYPdAAAAAJ",
    "name": "Geoffrey Hinton",
    "affiliation": "Emeritus Prof. Computer Science, University of Toronto",
    "homepageUrl": "http://www.cs.toronto.edu/~hinton",
    "interests": ["machine learning", "artificial intelligence", "cognitive science"],
    "totalCitations": 1029825,
    "hIndex": 190,
    "i10Index": 526,
    "citationHistogram": [
        {"year": 2023, "count": 112043}
    ],
    "coAuthors": [
        {"authorId": "m1qAiOUAAAAJ", "name": "Yann LeCun"}
    ],
    "publications": [
        {
            "title": "Deep learning",
            "authors": "Y LeCun, Y Bengio, G Hinton",
            "venue": "Nature 521 (7553), 436-444",
            "year": 2015,
            "citationCount": 82310
        }
    ]
}

Field	Type	Description
`recordType`	string	Always `"authorProfile"`
`authorId`	string	Scholar author ID
`name`	string	Author's display name
`affiliation`	string	Affiliation text as shown on the profile
`verifiedEmailDomain`	string	Verified email domain (when opted in)
`homepageUrl`	string	Personal or institutional homepage
`interests`	string[]	Research interest tags
`totalCitations`	number	All-time total citations
`hIndex`	number	All-time h-index
`i10Index`	number	All-time i10-index
`citationHistogram`	object[]	Annual citation counts `[{year, count}, ...]`
`coAuthors`	object[]	Linked co-authors with their own IDs
`publications`	object[]	Full publication list with titles, venues, years, and citation counts

The author profile also includes *Since variants (totalCitationsSince, hIndexSince, i10IndexSince) scoped to Scholar's recent window, plus profileImageUrl. For a clean table display, the row also mirrors its headline into the shared columns — title (the author's name), citationCount (total citations), and url (the profile link) — so it reads sensibly in the Papers tab too.

Alongside the summary row, each of the author's publications is emitted as its own paper row (same shape as a search result). Those rows honor the yearFrom/yearTo filter and sortBy, carry the abstract when includeAbstracts is on, and follow the "Cited by" graph when includeCitations is on. Each carries an authorId so you can group an author's papers downstream. The summary row's nested publications array mirrors the same filtered set.

Citing Paper (`recordType: "citingPaper"`)

Emitted only when includeCitations: true. Capped at maxCitationsPerPaper per source paper.

{
    "recordType": "citingPaper",
    "parentPaperTitle": "Attention is all you need",
    "title": "BERT: Pre-training of deep bidirectional transformers for language understanding",
    "url": "https://arxiv.org/abs/1810.04805",
    "authors": "J Devlin, MW Chang, K Lee, K Toutanova",
    "year": 2018,
    "venue": "arXiv preprint arXiv:1810.04805",
    "citationCount": 98421,
    "pdfUrl": "https://arxiv.org/pdf/1810.04805"
}

Same shape as a paper row, plus parentPaperTitle, parentClusterId, and parentQuery so you can join every citer back to the source paper it references.

Tips for Best Results

Narrow the query. Scholar returns the best 1,000 hits for any query — broad terms like "machine learning" will drown out the gems. Add a modifier ("machine learning healthcare 2024") to get a tighter, more useful set.
Use the year filter to cut noise. A yearFrom: 2023 filter strips away decades of older work and dramatically improves signal for recent literature reviews.
Pick the right sort order. sortBy: "date" surfaces the most recent work; sortBy: "relevance" keeps Scholar's citation-weighted ranking for foundational reading.
Combine authorIds and searchQueries in one run. Pay for one start and get both a topic survey and the specific author profiles you care about.
Prefer smaller maxResults for faster runs. If you need 50 papers, ask for 50 — not 1,000. Fewer pages means a quicker, cheaper run.
Turn off abstracts when you don't need them. Setting includeAbstracts: false shrinks every row and speeds up large runs.
Use citations sparingly. includeCitations: true multiplies row count by up to 20× per paper. Keep maxResults modest (5–20) when you switch it on.
Author runs produce a row per publication. Each author ID yields one authorProfile summary plus a paper row for every publication (up to maxResults), so budget your result count accordingly. Add a yearFrom/yearTo window to focus on recent work — the filter applies to an author's papers just like a search.

Pricing

$4.00 per 1,000 results — matches the market rate for Scholar extraction while bundling author metrics and citation graphs at no extra charge.

Results	Estimated Cost
100	$0.40
1,000	$4.00
10,000	$40.00
100,000	$400.00

A "result" is any row in the output dataset — a paper, an author profile, or a citing paper. Platform fees (compute, storage) are additional and depend on your Apify plan.

Integrations

Export data in JSON, CSV, Excel, XML, or RSS. Connect to 1,500+ apps via:

Zapier / Make / n8n — Workflow automation
Google Sheets — Direct spreadsheet export
Slack / Email — Notifications on new results
Webhooks — Trigger custom APIs on run completion
Apify API — Full programmatic access

Legal & Ethical Use

This actor is designed for legitimate academic research, bibliometrics, literature review, and market intelligence. Users are responsible for complying with applicable laws and Google Scholar's terms of service, including making reasonable-rate requests and respecting content usage rules for any papers linked from Scholar. Do not use extracted data for spam, harassment, or any illegal purpose.

Google Scholar Scraper

lulzasaur/google-scholar-scraper

Scrape Google Scholar search results with titles, authors, citations, abstracts, and PDF links. Also supports author profile mode to extract h-index, i10-index, and publication lists.

lulz bot

Google Scholar Scraper

george.the.developer/google-scholar-scraper

Scrape Google Scholar for academic papers, citations, author profiles. No API key needed. Extract titles, authors, abstracts, citation counts, PDF links, h-index, i10-index. Export JSON, CSV, Excel. Anti-bot protection with residential proxies, UA rotation, CAPTCHA detection.

George Kioko

105

5.0

Google Scholar Scraper

automation-lab/google-scholar-scraper

Search Google Scholar and extract academic papers. Get titles, authors, citation counts, abstracts, PDF links, and publication details. Supports year filtering.

Stas Persiianenko

Google Scholar Scraper - Academic Papers Search

gio21/google-scholar-scraper

Search Google Scholar for academic papers. Get title, authors, year, publication, snippet, cited-by count, PDF links. Filter by year range, language.

Gio

Google Scholar | Research Papers, Citations & Author Profiles

johnvc/google-scholar-api

Scrape Google Scholar at scale. Search research papers, get citation formats (MLA, APA, Chicago, BibTeX), author profiles with h-index and i10-index, list an author's publications, view per-article citation history, & map co-author networks. Six modes in one for lit reviews, bibliometrics, & agents.

John

5.0

Google Scholar Scraper

johnlenflure/google-scholar-scraper

Scrape Google Scholar search results. Extract paper titles, authors, abstracts, citation counts, years, PDF links, and related article URLs.

Sinan Donmez

Google Scholar Scraper - Academic Papers & Citations

klondikeking/google-scholar-scraper-v2

Extract academic papers, citations, authors, and PDF links from Google Scholar.

Pierrick McD0nald

🎓 Google Scholar Scraper — Papers & Citations

nexgendata/google-scholar-scraper

Scrape Google Scholar for papers, citations, authors & h-index data. Semantic Scholar, Scopus & Web of Science alternative for literature reviews, citation analysis, author clustering and research analytics. Pay per paper.

NexGenData

Semantic Scholar Author Profiles Scraper

parseforge/semantic-scholar-author-profiles-scraper

Collect researcher profiles from Semantic Scholar. Extract h-index, citation counts, publication history, affiliations, and external IDs for any academic author. Search by name or author ID. Download structured data as CSV, JSON, or Excel for research evaluation, talent scouting, and grant reviews.

ParseForge

🔍 Google Scholar Scraper

scraper-engine/google-scholar-scraper

Google Scholar Scraper research papers from Google Scholar, including titles, authors, publication years, journals, citations, abstracts, PDFs, and profile links. Export structured data to JSON, CSV, Excel, or XML for academic research, literature reviews, citation analysis, and AI workflows.