Google Scholar Scraper avatar

Google Scholar Scraper

Pricing

from $2.00 / 1,000 results

Go to Apify Store
Google Scholar Scraper

Google Scholar Scraper

[πŸ’° $2.0 / 1K] Extract academic papers, author profiles, h-index, i10-index, citation counts, abstracts, and PDF links from Google Scholar. Batch search queries and author IDs, filter by year range, sort by relevance or date.

Pricing

from $2.00 / 1,000 results

Rating

0.0

(0)

Developer

SolidCode

SolidCode

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

2

Monthly active users

18 hours ago

Last modified

Share

Pull academic papers, author profiles, and citation data from Google Scholar at scale β€” complete with h-index, i10-index, citation counts, BibTeX entries, and formatted MLA/APA/Chicago/Harvard/Vancouver citations. Built for researchers, analysts, and content teams who need a clean, structured academic dataset without wrestling with Scholar's HTML one page at a time.

Why This Scraper?

  • Papers, authors, and citations in one actor β€” search Scholar by keywords, pull complete author profiles by ID, and follow the "Cited by" graph. One run, one dataset.
  • Batch everything β€” many queries and many author IDs in a single invocation. Pay once for the setup, get all your results in one place.
  • Up to 1,000 papers per query β€” hits Google Scholar's own upper bound with smooth pagination and no duplicates.
  • Year range and date-sorted results β€” narrow to a publication window or sort by most-recent-first to surface the latest literature.
  • BibTeX and formatted citations on demand β€” enrich every paper with a ready-to-paste BibTeX entry and MLA, APA, Chicago, Harvard, and Vancouver citation strings.
  • 20+ languages and 40+ countries β€” localize results with language and country controls for regional coverage.
  • No API key, no sign-up β€” Google Scholar has no public API. This is the fastest path from a keyword to a clean academic dataset.

Use Cases

Academic Research & Literature Reviews

  • Build a ranked reading list for a new research topic in minutes
  • Track the citation graph of a seminal paper to find follow-up work
  • Discover adjacent researchers via the "Cited by" chain

Competitive & Industry Intelligence

  • Monitor what research labs or university groups are publishing on a topic
  • Benchmark academic output of competing institutions by author ID
  • Detect emerging sub-fields from a burst of recent publications

Grant Writing & Funding Prep

  • Assemble a bibliography of prior work to justify a new grant proposal
  • Quantify a lab's impact with total citations, h-index, and i10-index
  • Identify gaps in the literature to frame a novel research question

Bibliometrics & Research Analytics

  • Build citation-count time series for meta-analysis or scientometrics
  • Analyze author productivity trends across years
  • Map co-author networks from author profile data

SEO & Content Research

  • Back marketing claims with peer-reviewed sources
  • Find credible experts to quote or interview for long-form content
  • Surface studies that competitors cite to match their evidence depth

Education & Curriculum Design

  • Compile course reading lists from the most-cited papers in a field
  • Discover open-access PDF versions of academic texts
  • Track which textbook chapters or papers are cited in recent syllabi

Getting Started

The simplest possible run β€” one topic, 50 papers:

{
"searchQueries": ["quantum error correction"],
"maxResults": 50
}

Filtered Search (Year + Language + Country)

Narrow to recent papers and localize for a European audience:

{
"searchQueries": ["large language models healthcare"],
"yearFrom": 2023,
"yearTo": 2025,
"sortBy": "date",
"language": "de",
"country": "de",
"maxResults": 100
}

Author Profile Lookup

Pull a complete profile by Scholar author ID β€” metrics, research interests, co-authors, and the full publication list. Paste either the ID or the full URL:

{
"authorIds": [
"JicYPdAAAAAJ",
"https://scholar.google.com/citations?user=5KJrNtoAAAAJ&hl=en"
],
"maxResults": 200
}

To find an author ID, open any Google Scholar author page and copy the value after user= in the URL.

Combined Search + Citation Graph

Fetch papers for a query, then follow each paper's "Cited by" link:

{
"searchQueries": ["attention is all you need"],
"authorIds": ["JicYPdAAAAAJ"],
"includeCitations": true,
"maxCitationsPerPaper": 50,
"includeAbstracts": true,
"maxResults": 10
}

Bibliography Export (BibTeX + Formatted Citations)

Pull a topic's top papers with ready-to-paste BibTeX entries and pre-formatted citations:

{
"searchQueries": ["bert language model"],
"includeBibtex": true,
"maxResults": 20
}

Input Reference

What to Scrape

ParameterTypeDefaultDescription
searchQueriesstring[]["machine learning healthcare"]Keywords to search on Google Scholar. Each query produces its own set of paper results.
authorIdsstring[][]Google Scholar author IDs or full profile URLs. Paste either the 10-14 character ID (JicYPdAAAAAJ) or the URL.

Results

ParameterTypeDefaultDescription
maxResultsinteger50Maximum papers per search query. Google Scholar caps at roughly 1,000 results. Set to 0 for everything available.

Filters

ParameterTypeDefaultDescription
yearFromintegernullOnly include papers published in this year or later.
yearTointegernullOnly include papers published in this year or earlier.
sortBystring"relevance""relevance" keeps Scholar's default ranking. "date" sorts most recent first.

Localization

ParameterTypeDefaultDescription
languagestring"en"Scholar interface and snippet language. 20 options including English, Spanish, German, French, Japanese, Chinese, Arabic, and more.
countrystring"us"Country code for regional localization. 45 options across the Americas, Europe, Asia-Pacific, and MENA.

Enrichment

ParameterTypeDefaultDescription
includeAbstractsbooleantrueInclude the snippet/abstract text for each paper.
includeCitationsbooleanfalseFor each paper, follow the "Cited by" link and return citing papers. Significantly increases runtime and cost.
maxCitationsPerPaperinteger20Cap on citing papers per source paper when includeCitations is on. Up to 200.
includeBibtexbooleanfalseEnrich every paper row with a BibTeX entry and MLA/APA/Chicago/Harvard/Vancouver citation strings. Adds two extra Scholar requests per paper.

Output

Every row carries a recordType field β€” paper, authorProfile, or citingPaper β€” so you can filter cleanly downstream.

Paper (recordType: "paper")

{
"recordType": "paper",
"query": "attention is all you need",
"rank": 1,
"title": "Attention is all you need",
"url": "https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html",
"authors": "A Vaswani, N Shazeer, N Parmar, J Uszkoreit",
"authorList": ["A Vaswani", "N Shazeer", "N Parmar", "J Uszkoreit"],
"year": 2017,
"venue": "Advances in neural information processing systems",
"abstract": "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks...",
"citationCount": 142301,
"citedByUrl": "https://scholar.google.com/scholar?cites=2960712678066186980",
"versionsCount": 73,
"pdfUrl": "https://arxiv.org/pdf/1706.03762.pdf",
"pdfSource": "arxiv.org"
}
FieldTypeDescription
recordTypestringAlways "paper"
querystringThe search query that produced this row
ranknumberPosition in the query's result set
titlestringPaper title
urlstringCanonical paper URL (journal, arXiv, etc.)
authorsstringComma-separated author line
authorListstring[]Authors split into an array
yearnumberPublication year
venuestringJournal, conference, or publisher
abstractstringSnippet / abstract text
citationCountnumberNumber of papers citing this one
citedByUrlstringScholar URL to the full citing-paper list
versionsCountnumberNumber of versions Scholar found
pdfUrlstringDirect PDF link when Scholar lists one
pdfSourcestringHost domain of the PDF link
bibtexstringRaw BibTeX entry β€” only when includeBibtex: true
formattedCitationsobject{mla, apa, chicago, harvard, vancouver} strings β€” only when includeBibtex: true

Author Profile (recordType: "authorProfile")

{
"recordType": "authorProfile",
"authorId": "JicYPdAAAAAJ",
"name": "Geoffrey Hinton",
"affiliation": "Emeritus Prof. Computer Science, University of Toronto",
"homepageUrl": "http://www.cs.toronto.edu/~hinton",
"interests": ["machine learning", "artificial intelligence", "cognitive science"],
"totalCitations": 1029825,
"hIndex": 190,
"i10Index": 526,
"citationHistogram": [
{"year": 2023, "count": 112043}
],
"coAuthors": [
{"authorId": "m1qAiOUAAAAJ", "name": "Yann LeCun"}
],
"publications": [
{
"title": "Deep learning",
"authors": "Y LeCun, Y Bengio, G Hinton",
"venue": "Nature 521 (7553), 436-444",
"year": 2015,
"citationCount": 82310
}
]
}
FieldTypeDescription
recordTypestringAlways "authorProfile"
authorIdstringScholar author ID
namestringAuthor's display name
affiliationstringAffiliation text as shown on the profile
verifiedEmailDomainstringVerified email domain (when opted in)
homepageUrlstringPersonal or institutional homepage
interestsstring[]Research interest tags
totalCitationsnumberAll-time total citations
hIndexnumberAll-time h-index
i10IndexnumberAll-time i10-index
citationHistogramobject[]Annual citation counts [{year, count}, ...]
coAuthorsobject[]Linked co-authors with their own IDs
publicationsobject[]Full publication list with titles, venues, years, and citation counts

The author profile also includes *Since variants (totalCitationsSince, hIndexSince, i10IndexSince) scoped to Scholar's recent window, plus profileImageUrl.

Citing Paper (recordType: "citingPaper")

Emitted only when includeCitations: true. Capped at maxCitationsPerPaper per source paper.

{
"recordType": "citingPaper",
"parentPaperTitle": "Attention is all you need",
"title": "BERT: Pre-training of deep bidirectional transformers for language understanding",
"url": "https://arxiv.org/abs/1810.04805",
"authors": "J Devlin, MW Chang, K Lee, K Toutanova",
"year": 2018,
"venue": "arXiv preprint arXiv:1810.04805",
"citationCount": 98421,
"pdfUrl": "https://arxiv.org/pdf/1810.04805"
}

Same shape as a paper row, plus parentPaperTitle, parentClusterId, and parentQuery so you can join every citer back to the source paper it references.

Tips for Best Results

  • Narrow the query. Scholar returns the best 1,000 hits for any query β€” broad terms like "machine learning" will drown out the gems. Add a modifier ("machine learning healthcare 2024") to get a tighter, more useful set.
  • Use the year filter to cut noise. A yearFrom: 2023 filter strips away decades of older work and dramatically improves signal for recent literature reviews.
  • Pick the right sort order. sortBy: "date" surfaces the most recent work; sortBy: "relevance" keeps Scholar's citation-weighted ranking for foundational reading.
  • Combine authorIds and searchQueries in one run. Pay for one start and get both a topic survey and the specific author profiles you care about.
  • Prefer smaller maxResults for faster runs. If you need 50 papers, ask for 50 β€” not 1,000. Fewer pages means a quicker, cheaper run.
  • Turn off abstracts when you don't need them. Setting includeAbstracts: false shrinks every row and speeds up large runs.
  • Use citations sparingly. includeCitations: true multiplies row count by up to 20Γ— per paper. Keep maxResults modest (5–20) when you switch it on.
  • Author profiles return at least 20 publications per request. Scholar's profile pagination has a 20-publication minimum, so a maxResults: 5 run on authorIds may still yield 20 publications in the publications array.

Pricing

$4.00 per 1,000 results β€” matches the market rate for Scholar extraction while bundling author metrics and citation graphs at no extra charge.

ResultsEstimated Cost
100$0.40
1,000$4.00
10,000$40.00
100,000$400.00

A "result" is any row in the output dataset β€” a paper, an author profile, or a citing paper. Platform fees (compute, storage) are additional and depend on your Apify plan.

Integrations

Export data in JSON, CSV, Excel, XML, or RSS. Connect to 1,500+ apps via:

  • Zapier / Make / n8n β€” Workflow automation
  • Google Sheets β€” Direct spreadsheet export
  • Slack / Email β€” Notifications on new results
  • Webhooks β€” Trigger custom APIs on run completion
  • Apify API β€” Full programmatic access

This actor is designed for legitimate academic research, bibliometrics, literature review, and market intelligence. Users are responsible for complying with applicable laws and Google Scholar's terms of service, including making reasonable-rate requests and respecting content usage rules for any papers linked from Scholar. Do not use extracted data for spam, harassment, or any illegal purpose.