Google Scholar Scraper avatar

Google Scholar Scraper

Pricing

from $5.00 / 1,000 results

Go to Apify Store
Google Scholar Scraper

Google Scholar Scraper

Scrape Google Scholar search results with titles, authors, citations, abstracts, and PDF links. Also supports author profile mode to extract h-index, i10-index, and publication lists.

Pricing

from $5.00 / 1,000 results

Rating

0.0

(0)

Developer

lulz bot

lulz bot

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

2 days ago

Last modified

Categories

Share

Scrape Google Scholar search results, paper citations, abstracts, and author profiles — no API key required.

Google Scholar has no official API, making this scraper invaluable for researchers, academics, bibliometricians, and developers who need structured access to academic publication data.

Features

  • Search mode: Search papers by keyword with title, authors, journal, year, abstract, citation count, and PDF links
  • Author profile mode: Fetch an author's complete profile including h-index, i10-index, total citations, and full publication list
  • Year range filtering (yearFrom / yearTo)
  • Sort by relevance or date
  • Automatic pagination (10 results per page, up to 1000)
  • CAPTCHA detection with automatic retry (30–60s wait, up to 3 attempts)
  • Residential proxy support (required — Google blocks datacenter IPs)

Input

FieldTypeDefaultDescription
modeselectsearchsearch or authorProfile
querystringmachine learningSearch keywords (search mode)
authorIdstringGoogle Scholar user ID (author profile mode)
yearFromintegerFilter papers from this year
yearTointegerFilter papers up to this year
sortByselectrelevancerelevance or date
limitinteger10Max results (1–1000)
proxyConfigurationobjectApify ResidentialProxy settings

Finding an Author ID

Go to any Google Scholar author profile page. The URL will look like:

https://scholar.google.com/citations?user=XXXXXXXXXXX&hl=en

Copy the value after user= — that is the author ID.

Output

Search Mode

{
"title": "Attention Is All You Need",
"url": "https://arxiv.org/abs/1706.03762",
"authors": ["A Vaswani", "N Shazeer", "N Parmar"],
"source": "Advances in neural information processing systems",
"year": 2017,
"abstract": "The dominant sequence transduction models are based on complex recurrent...",
"citationCount": 98432,
"pdfUrl": "https://arxiv.org/pdf/1706.03762",
"allVersionsUrl": "https://scholar.google.com/scholar?cluster=...",
"relatedArticlesUrl": "https://scholar.google.com/scholar?q=related:...",
"scholarUrl": "https://arxiv.org/abs/1706.03762",
"searchQuery": "transformer attention mechanism",
"scrapedAt": "2024-01-15T10:23:45.123Z"
}

Author Profile Mode

{
"name": "Geoffrey Hinton",
"affiliation": "University of Toronto",
"hIndex": 155,
"i10Index": 322,
"totalCitations": 783210,
"publications": [
{
"title": "Deep learning",
"url": "https://scholar.google.com/citations?view_op=view_citation&...",
"authorsSource": "Y LeCun, Y Bengio, G Hinton - nature, 2015",
"year": 2015,
"citationCount": 62450
}
],
"scholarUrl": "https://scholar.google.com/citations?user=JicYPdAAAAAJ&hl=en",
"scrapedAt": "2024-01-15T10:23:45.123Z"
}

Use Cases

  • Literature reviews: Collect papers on a topic with citation counts to identify seminal works
  • Bibliometric analysis: Track citation trends, prolific authors, top journals
  • Research monitoring: Watch for new papers on keywords you care about
  • Academic recruiting: Find researchers by h-index and publication record
  • Competitive intelligence: Monitor competitor research output and emerging tech areas
  • Grant writing: Find citation stats and related work for proposals

Example Inputs

Search for recent AI safety papers

{
"mode": "search",
"query": "AI alignment safety",
"yearFrom": 2022,
"sortBy": "date",
"limit": 50
}

Get Geoffrey Hinton's author profile

{
"mode": "authorProfile",
"authorId": "JicYPdAAAAAJ",
"limit": 100
}

Top cited NLP papers ever

{
"mode": "search",
"query": "natural language processing BERT transformer",
"sortBy": "relevance",
"limit": 100
}

Pricing

This actor uses Pay Per Event (PPE) pricing — $0.005 per result scraped.

  • 100 papers = $0.50
  • 500 papers = $2.50
  • 1,000 papers = $5.00

Notes

  • Google Scholar enforces rate limits. The scraper adds 2–6 second delays between pages to stay within limits.
  • If a CAPTCHA is detected, the scraper waits 30–60 seconds and retries up to 3 times before skipping.
  • Residential proxies are mandatory. The default configuration uses Apify's residential proxy pool.
  • Google Scholar shows a maximum of ~100 results per search query through normal pagination. For larger datasets, split your query into multiple narrower searches.